Until across European Languages: A Parallel Corpus Study

: We present a parallel corpus study on the expression of the temporal construction ‘not...until’ in a sample of European languages. We use data from the Europarl corpus and create semantic maps by multidimensional scaling, in order to analyze cross-linguistic and language-internal variation. This paper builds on formal semantic and typological work, extending it by including conditional constructions, as well as connectives of the type as long as . In an investigation of 7 languages, we ﬁnd that (i) languages use many more different constructions to convey this meaning than was expected from the literature; and (ii) the combination of polarity marking (negation/assertion) strongly correlates with the type of connective. We corroborate our results in a larger sample of 21 European languages. An analysis of clusters and dimensions of the semantic maps based on the enlarged dataset shows that connectives are not randomly distributed across the semantic space of the ‘not...until’ domain.


Introduction
This paper is a corpus study of 'not. . . until' constructions across a sample of European standard languages extracted from the parallel text corpus Europarl (Koehn 2005). A typical Europarl example is (1).
(1) The guidelines are not implemented until the end of 2010. [sentence uttered before 2010] Until in (1) is a temporal preposition linking two event phases: a negative pre-phase of not implementing the guidelines is followed by a positive post-phase of implementation of the guidelines. The change from the negative to the positive phase occurs at or shortly after the time denoted by the NP complement of until (the end of 2010). As a temporal connective, until can also link two clauses, as in (2).
(2) Naturally, Turkey cannot join the EU until all the criteria are met.
The speaker in (2) is a member of the European Parliament who argues that the situation of Turkey not-joining the EU will last until something happens that will lead to a change in state. Such clause linking is frequently encoded by not. . . until in Europarl, but we also find other means of expression, such as only. . . when in (3) or if in (4).
(3) Only when corruption has genuinely been eradicated in European countries should we try reverting to the imperious recommendations granted to various countries in the resolutions adopted, unfortunately, by us. (4) Europe must mobilise the Solidarity Fund and we know that if the budget is not approved, the fund cannot be mobilised.
The examples report a change in state or a potential change in state. With the PP in (1), this is a purely temporal change from a negative phase to a positive phase. In Table 1. Set of paraphrases of not. . . until in (1) Table 1 illustrates that the meaning conveyed by the original sentences in (1) and (2) can be expressed by various temporal connectives (UNTIL, BEFORE, WHEN, AFTER), exceptive phrases (WITHOUT) or conditionals (IF, UNLESS). Depending on the connective used, we find a negation in the main clause (UNTIL, BEFORE: NA), a negation in both main and subordinate clause (AS.LONG.AS, IF, WITHOUT, UNLESS: NN), or a focus particle (ONLY) in the main clause that combines with an affirmative subordinate clause (temporal or conditional: AA). Interestingly, the configuration AN is missing: there is no paraphrase in Table 1 that combines an affirmative main clause with a negative temporal clause. Examples with temporal NPs (rather than full clauses) have equivalents to the NA-and AA-construals, but not to the NN-construal. Table 1 illustrates that both temporal and conditional strategies are used. We know that these meanings are intertwined, for instance, in the use of English when as a temporal connective and a domain restrictor (see Farkas and Sugioka 1983 for discussion). Our corpus study looks at these overlapping domains from a new angle by investigating the distributional patterns within and across languages. In this paper, we will investigate to what extent grammatical paraphrases of not. . . until such as the ones listed in Table 1 occur in a range of European languages represented in the Europarl corpus, and what determines their choice language-internally and cross-linguistically. The research questions we will address in the paper are listed in Table 2. cross-linguistic differences 2. semantics and pragmatics, or, 3.
is it free variation? (Q4) What is the relationship between temporality and conditionality in the 'not. . . until' domain? The methodology relies on parallel corpus data, and we use multidimensional scaling as a statistical and visualization technique to reveal the patterns. This resembles the approaches in Wälchli and Cysouw (2012), Wälchli (2018Wälchli ( /2019, and has been dubbed Translation Mining by van der Klis et al. (2017). The methodology will be introduced in Section 3, but see van der Klis and Tellings (2022) for a more exhaustive overview. A special feature of this paper is that we do not only use Translation Mining to investigate cross-linguistic variation in a lexical domain (in our case, choice of connective), but also to study the co-occurrence of two grammatical markers: connective and polarity marking in main and subordinate clauses. These markers interact compositionally to determine the semantics and pragmatics of the 'not. . . until' construction. Hereby we contribute to the underexplored field of cross-linguistic variation with respect to compositional meaning (see von Fintel and Matthewson 2008 for the need to study variation of meaning composition). Finally, this work can be seen as connecting insights and methodology from the typological approach and the formal semantic approach.
Our corpus study proceeds in two steps that are based on two different multilingual datasets, named D1 (fewer languages, more parallel datapoints) and D2 (more languages, fewer parallel datapoints), both extracted from Europarl. We start in Section 3 with dataset D1, which is constructed based on information from the literature discussed in Section 2. It contains 7 European languages, which exemplify the main clusters of connectives found in Wälchli (2018Wälchli ( /2019. The intermediate results of analyzing D1 in Section 4 reveal that there is stability with respect to the combination of connective and polarity pattern, as predicted by compositional semantics (research question Q2). Future vs. past time reference turns out to play a role in the balance between conditionality and temporality, and in that sense the Europarl data fill a gap in comparison to earlier discussions in the literature (Q4). Surprisingly, we find much more variation in connective choice than previous literature led us to expect (research questions Q1 and Q3). In order to deal with the large amount of variation, we created a second dataset D2 which contains fewer datapoints, but more languages. The increase in number of languages to 21 allows for more robust statistical testing of patterns of cross-linguistic variation and stability (Section 5). The analysis of D2 in Section 5 replicates the two main findings from D1 in terms of strategies (Q1) and compositionality (Q2). The larger set of languages reveals more language-internal and cross-linguistic stability in the data after all, and thus resolves some of the issues that arose after D1 (research question Q3). Before we proceed to the parallel corpus study, we provide a short background on the construction at hand from the perspective of the semantic and typological literature in Section 2.
Under the formal semantics of Kamp (1968), until is expected to combine with durative verb phrases like sleep in (6), but not wake up as in (5). The felicity of a telic verb like wake up in contexts like (5) has been taken to support three possible analyses: (i) A scopal ambiguity analysis of negation as an aspectual operator leading to durative phrases as its output (Smith 1974;Mittwoch 1977); (ii) A lexical ambiguity analysis in which there is a punctual until meaning 'before' functioning as a negative polarity item (NPI) next to the familiar durative until (Karttunen 1974) or (iii) A construction-based analysis in which 'until t' directly composes with not to mean 'from t onwards' (Hitzeman 1991) or 'only at t' (Declerck 1995).
All analyses claim to account for the fact that the event of the princess waking up actually took place, and that this was considered late (the speaker or the addressee might have expected it to occur earlier than nine o'clock). However, the authors follow different routes in building up the meaning of (5). de Swart (1996) uses a standard event-based compositional semantics and basic insights from the pragmatic literature about conversational implicatures to probe into approaches (i)-(iii) to not. . . until. She argues that they are equivalent in that they all derive the relevant meaning components of a sleeping phase followed by an awake phase that starts at, or shortly after, nine o'clock. What is part of the truth conditions under one analysis ends up being a conversational implicature under another one, and vice versa, so equivalence of meaning requires a combination of semantics and pragmatics. We do not replicate the argumentation, but refer to de Swart (1996) for technical details. We focus here on her suggestion that the different proposals are motivated by cross-linguistic variation in the expression of the particular meaning at stake. This idea is grounded in Karttunen's (1974) observation that Finnish would use ennen 'before' in configurations like (5), but clearly this connective is used to translate before in affirmative sentences. Hitzeman (1991) and Declerck (1995) might be inspired by the fact that German and Dutch use scalar adverbs like erst and pas in the translation of examples like (5), as we see in the Dutch example in (7). The adverbs erst and pas are focus particles similar to English only, except for the fact that they convey exclusion on a scale. So (7) excludes all times before nine o'clock as wake up times for the princess. We have encountered not. . . before and only. . . when as possible counterparts of the not. . . until configuration in Table 1, so the emerging hypothesis is that languages use different strategies to convey the meaning that English encodes as not. . . until in (6), and native speaker intuitions may have inspired the various authors to their respective analyses. The combination of the focus particle only and the temporal connective when in affirmative main and subordinate clauses profiles the positive post-phase, leading to a different balance between assertion and implicature than the configuration with not. . . until in (2). In contrast, the conditional sentence in (4) profiles the negative pre-phase by using negation in both clauses to convey the dependency of mobilization of the funds on approval of the budget.
One of the aims of this paper is to check whether parallel corpora provide empirical support for the idea that different languages use different grammatical strategies to convey the meaning of examples like (6). Research question Q1 in Table 2 is driven by typological investigations of NOT. . . UNTIL.

NOT. . . UNTIL from a Typological Point of View
Most typological investigations of temporal clauses address much larger domains such as subordination (Cristofaro 2003) or adverbial clauses (Kortmann 1997;Thompson et al. 2007;Hetterle 2015). Kortmann (1997: 185) offers the semantic map of temporal Languages 2022, 7, 56 5 of 33 connectives in European languages in Figure 1 as a "simplified view of the TIME network". Its horizontal dimension can be read as a scale from anterior via simultaneous to posterior temporal relations and its vertical dimension opposes definite time (bottom) to indefinite time (top), where generalizing temporal clauses (contingency) are added. In Kortmann's map, UNTIL and BEFORE occur as adjacent domains, but 'not. . . until'-as elsewhere in the typological literature-does not figure as a domain of its own.

NOT...UNTIL from a Typological Point of View
Most typological investigations of temporal clauses address much larger domains such as subordination (Cristofaro 2003) or adverbial clauses (Kortmann 1997;Thompson et al. 2007;Hetterle 2015). Kortmann (1997: 185) offers the semantic map of temporal connectives in European languages in Figure 1 as a "simplified view of the TIME network". Its horizontal dimension can be read as a scale from anterior via simultaneous to posterior temporal relations and its vertical dimension opposes definite time (bottom) to indefinite time (top), where generalizing temporal clauses (contingency) are added. In Kortmann's map, UNTIL and BEFORE occur as adjacent domains, but 'not...until' -as elsewhere in the typological literature -does not figure as a domain of its own. WHENEVER | SINCE -AFTER -AS SOON AS -WHEN -WHILE -AS LONG AS -UNTIL -BEFORE (Latinate labels replaced by English labels and boxes replaced by lines).  Kortmann (1997, p. 185 The other languages of the sample all display some sort of overlap between markers in the 'not...until'-domain with connectives that also express BEFORE, UNTIL or AS.LONG.AS, notably BEFORE in other Northern European languages such as Finnish and Danish, UNTIL in Western European languages and connectives not differentiating between AS.LONG.AS/UNTIL in many Eastern European languages. His special interest resides in Baltic languages, and he shows that all three patterns are found in this language group.
De Swart (1996) and Wälchli (2018/2019) investigate essentially the same meaning, but they do so from different perspectives. Wälchli shows that NOT...UNTIL emerges as a temporal domain of its own between UNTIL and BEFORE. The overlap with UNTIL or BEFORE Figure 1. Semantic map of temporal clauses following Kortmann (1997, p. 185 The other languages of the sample all display some sort of overlap between markers in the 'not. . . until' domain with connectives that also express BEFORE, UNTIL or AS.LONG.AS, notably BEFORE in other Northern European languages such as Finnish and Danish, UNTIL in Western European languages and connectives not differentiating between AS.LONG.AS/UNTIL in many Eastern European languages. His special interest resides in Baltic languages, and he shows that all three patterns are found in this language group. de Swart (1996) and Wälchli (2018/2019) investigate essentially the same meaning, but they do so from different perspectives. Wälchli shows that NOT. . . UNTIL emerges as a temporal domain of its own between UNTIL and BEFORE. The overlap with UNTIL or BEFORE aligns with two of the three analyses of NOT. . . UNTIL in the semantic literature reviewed by de Swart. The forms taken into account vary slightly across the two papers. Both discuss the configurations NOT. . . UNTIL and NOT. . . BEFORE, with a link to negative polarity. De Swart adds the configuration ONLY. . . WHEN as a strategy found in Germanic languages like Dutch and German. No languages using ONLY. . . WHEN occurs in Wälchli's dataset as a major strategy in the 'not. . . until' domain, but he adds the configuration NOT. . . AS.LONG.AS. . . NOT, which was not part of de Swart's paper. Table 1 suggests that we further need to branch out into conditionality and exceptive phrases (identified by Wälchli 2018/2019 as minor strategies, but not dominant in any language of the sample).
This paper investigates all possible configurations to convey the meaning of examples like (1)-(3) and (5)-(6). As a representative set of languages that we expect to instantiate these configurations, for D1 we take Swedish (NOT. . . UNTIL NPI ), Finnish (NOT. . . BEFORE), English (NOT. . . UNTIL), Lithuanian (NOT. . . AS.LONG.AS. . . NOT), German and Dutch (ONLY. . . WHEN). We add French to the dataset, because it is unclear from de Swart (1996) what strategy this language adopts. In this study, we use Europarl to collect independent evidence for the generalizations made in Wälchli (2018/2019) in a different corpus.

NOT. . . UNTIL Constructions and Linkage
In Section 1 we introduced the idea that not. . . until links clauses and expresses a potential change in state. Many Europarl examples verbalize a particular kind of crossover of interests and control. In example (2) we see that the change considered necessary by the speaker is not controlled by the speaker, but by Turkey, as this country has to take action to meet the criteria. The utterance further suggests that if and when all the criteria are met, this will qualify Turkey for joining the EU. The Europarl corpus contains political content, and therefore it is not surprising that we find many examples illustrating this crossover of interests and control: one party desires one event to take place of which the other party is in control and vice versa. This phenomenon is called linkage (or iunctim) in political literature. Bow (2010: 3) defines political linkage as "efforts to break an impasse or otherwise improve one's bargaining position on a particular issue by tying it to another, unrelated issue". For a linguist, this statement is reminiscent of Lehmann's (1988: 182) definition of clause linkage: "a relation of dependency or sociation obtaining between clauses". If particular issues to be linked are articulated in clauses, political linkages can be expressed by clause linkage. Political linkage is a prototypical domain for 'not. . . until' constructions in the Europarl corpus.

The Problem of Bias
A parallel corpus investigation of NOT. . . UNTIL has two main advantages. First, corpus data in general are to be preferred over made up examples, because they provide us with meaning in context. If we have a large enough dataset, the similarities between datapoints give rise to patterns of language use that we can relate to grammar. Second, we do not need to define the meaning independently of the language under consideration. We trust the professional European translators to agree on the meaning that is expressed in the context at hand, and to select the appropriate expression to convey that meaning in the target language.
This assumption comes with the potential problem of translation biases, such as source language interference. This problem is not specific to Translation Mining, but applies to all methodologies that use parallel corpus data. Le Bruyn et al. (2022) discuss various traditions of parallel corpus research and how these methodologies deal with translation bias. In the Translation Mining approach, the problem is addressed in two ways. First, we focus on larger cross-linguistic patterns rather than individual words or sentences. Second, a parallel corpus study is generally followed up by monolinguistic corpus studies or experimental work to replicate results. Since the current paper is an initial study in this domain, we leave such follow-up studies for future research. In this study, we adopt two strategies that aim to minimize the problem. We check to what extent bias influences our results by investigating whether or not it is orthogonal to the research questions asked. The other strategy is to consider more than one set of datapoints, here Dataset 1 (fewer languages, more parallel datapoints) and Dataset 2 (more languages, fewer parallel datapoints).
Using a parallel text corpus, we explore the encoding of 'not. . . until' across languages by means of sampling datapoints that instantiate this domain. This is basically an onomasiological approach (from meaning to form). However, there is no way to find all meanings reflecting a domain directly in a large corpus otherwise than via markers that typically express it. Put differently, there must be one or several semasiological steps (from form to meaning) involved, which will introduce bias to one or several particular languages. Especially in D2, but also in D1, we make use of the fact that Swedish has a negative polarity item förrän that exclusively occurs in the 'not. . . until' domain. If we only sample datapoints in which förrän occurs, we will obviously miss other possible strategies in Swedish. This may result in an underrepresentation of diversity in the results, but not in an overrepresentation of diversity. We address the underrepresentation in D1 by adding search strings from other languages, at the cost of having to deal with bias toward several languages Languages 2022, 7, 56 7 of 33 and several constructions and construction types. While adding increasingly more search strings from more languages will distribute the bias more equally across languages, it will be increasingly more difficult to control for the effects of bias induced. Using search strings from all 21 languages of Europarl would entail that the results for all languages were at least partly determined by the search strings (statisticians speak of overfit in such constellations). For the research questions asked in this study it is more important to be able to control for the bias than to distribute it evenly across the languages considered.
We argue that the bias towards Swedish is orthogonal to our research questions (Q1-Q4; see Table 2). Swedish is the only language with a negative polarity connective, so förrän does not correspond to any of the eight paraphrases in Table 1. If we find connectives reflecting these configurations in the other 20 languages, this cannot be due to the initial bias (Q1). Because Swedish förrän strictly goes with the NA polarity pattern, NA may be the preferred choice in parallel examples in other languages, but this preference cannot explain the occurrence or distribution of the NN and AA patterns (Q2). Swedish förrän cannot explain cross-linguistic and language-internal variation patterns in its translation equivalents in Europarl (Q3) and Swedish förrän-being a temporal and not a conditional connective-cannot explain the occurrence of conditional connectives as translation equivalents.

Dataset D1 and Annotation
Europarl is a parallel corpus of proceedings of the European parliament from between 1996 and 2012 in 21 languages. In total, it has 759M tokens and 30.3M sentence fragments (we used Europarl version 8, distributed by OPUS; Tiedemann 2012). Of course, the number of languages in Europarl is more limited than in the Bible corpus, but we are not targeting a full-fledged typological analysis here. For dataset D1, we start with seven languages that correspond to the main clusters of connectives found in Wälchli (2018/ In the construction of dataset D1, we first searched for occurrences of (inte). . . förrän in the Swedish part of the corpus, and the corresponding translations in the six other languages. 2 130 datapoints were selected from a fragment of Europarl that only covered the years 2009 to 2011. This is already much more than the 25 datapoints for 'not. . . until' in Wälchli (2018Wälchli ( /2019. This method does not allow us to study variation in Swedish, because only one construction is extracted by design. In order to also explore potential variation in Swedish, data were added based on extraction of the construction not. . . until in English (n = 30), and the Dutch constructions pas. . . wanneer (ONLY. . . WHEN) (n = 32) and niet. . . zolang. . . niet (NOT AS.LONG.AS NOT) (n = 33). All in all, D1 contains markers that reflect three of the eight paraphrases in Table 1. In addition to biclausal constructions, these data include combinations with a nominal or PP complement after until/zolang. In total, dataset D1 consists of 225 parallel datapoints.
The data were manually annotated using the TimeAlign software. 3 In all 7 languages, the following properties were annotated (when there is a PP or nominal complement, the dependent clause fields were left empty): 1.
polarity in both main clause and dependent clause (whether the clause is negative or affirmative, in some languages we indicated extra distinctions such as expletive negation, etc.) 3. temporal/focus adverb or particle (if present, the adverb that is used in the main clause, such as German erst or Dutch pas) 4.
clause type (syntactic information about the type of dependent clause: tensed clause, conditional clause, PP, etc.) 5.
tense in both main and dependent clause (past, present, perfect, future, etc. as crosslinguistic tense categories) 6.
clause order (whether the dependent clause precedes or follows the main clause) For the categorization of the connectives, we relied on Wälchli (2018/2019). We learnt from this paper that modern Swedish förrän is a negative polarity expression that always co-occurs with negation, which inspired us to extract examples containing förrän in the first place. We know that Finnish ennen kuin occurs in non-negative contexts in which English uses before, so we categorize it as BEFORE. Dutch totdat, German bis, and French jusqu'à ce que were categorized as UNTIL, Dutch voordat, German bevor and French avant que as BEFORE, etc. Appendix A lists all the connectives in the dataset with their categorization in this paper.

MDS Semantic Maps
Once the data from the parallel corpus have been extracted and annotated, they can be used to create visualizations of cross-linguistic variation by means of multidimensional scaling (MDS). MDS is a statistical technique that reduces a complex dataset with variation in many dimensions to a lower-dimensional representation that can be displayed visually as a scatterplot, known as a semantic map. This methodology has been used both in large-scale cross-linguistic examinations, such as Croft and Poole (2008) and Wälchli and Cysouw (2012), as well as in studies comparing just a few languages, such as van der Klis et al. (2017). van der Klis and Tellings (2022) provide the technical background of MDS, an explanation of how to interpret MDS maps, and an overview of the application of MDS in linguistic theory. We refer the reader to that paper for a more comprehensive background than we can provide here.
The type of MDS map that is interesting for our purposes is a scatterplot in which the dots represent individual sentences (contexts) from the corpus. The algorithm places the dots based on a measure of similarity between contexts: similar contexts end up close together on the map, and dissimilar contexts end up far apart. This similarity measure is based on the annotation of the data: for example, when considering the annotation label 'connective', we can count two contexts as more similar when more languages use the same connective in both contexts. Table 3 illustrates with a fragment from a table of connectives used across languages in D2. It displays the connective used in 6 languages in 6 contexts from Europarl. On the basis of this table, a dissimilarity matrix consisting of all pairs of contexts can be calculated with Hamming distance as distance measure. For example, the pair of the first two contexts (written as <1,2>) has a dissimilarity of 1−3/6 = 0.5 since the connective is the same in 3 of 6 languages. It is hence more similar than the pair <5,6> where none of the connectives are the same, which results in the maximal dissimilarity value of 1.0. (See van der Klis and Tellings 2022: sec. 3 for more details on how dissimilarity matrices are constructed and used). The dissimilarity matrix determines the spatial configuration of the points in the map. This map is of linguistic interest: clusters of dots that are close together indicate that these dots are similar in the relevant sense, which invites further analysis of the linguistic properties of the corresponding sentences. The interpretation of clusters and dimensions is the main part of the interpretation of an MDS map. We will illustrate dimension and cluster interpretation of MDS maps in Section 5.
In the literature that uses parallel corpus data to create semantic maps by means of MDS, typically only a single feature of a construction is annotated for, such as the lexical item used, or the tense of the construction (van der Klis and Tellings 2022 list many examples of such studies). A special feature of the current work is that we annotated multiple different properties in all languages, which makes it possible to not only study cross-linguistic variation for each feature individually, but also how a combination of features varies from language to language. For our purposes, the variation with respect to the feature 'connective' is important (providing information about (semi)lexical variation for not. . . until constructions), but also variation with respect to the interaction between the connective and polarity values, as the formal semantic analysis makes predictions about this interaction.

Variation in Connective Choice
In this section we present the main quantitative results obtained from the analysis of dataset D1, centered around the following two observations already anticipated here:

•
Observation 1: variation with respect to connective choice is very high.

•
Observation 2: compositional semantics/pragmatics of the interaction between negation and connective is respected in all languages.
The expectation that languages use different expressions to convey the same meaning, but do so using strategies that are semantically and pragmatically equivalent is met. However, the expectation that all languages use a single predominant strategy in the NOT. . . UNTIL configuration is not met.
Observation 1 breaks down into three observations. First, there is a high amount of language-internal variation. All languages have between 20 and 23 attested constructions in the dataset, so there is substantial variation in the constructions used within each language (as opposed to the four search strings from four languages we started with in D1). Second, there is widespread distribution over constructions, so there is no clear dominant strategy appearing in more than 50% of the datapoints, except for Swedish förrän. Note that the high numbers of Swedish förrän and English until are probably skewed because they defined the search criterion in respectively 130 and 30 contexts of the dataset, respectively. The most frequent connectives for each language, in decreasing order of frequency, are: Swedish förrän NPI (145) >> English until (104) >> Lithuanian kol 'as long as, until' (80) >> Finnish ennen kuin 'before' (68) >> German wenn 'if' (58) >> Dutch zolang 'as long as' (58) >> French tant que 'as long as' (57).
Third, there is a high amount of cross-linguistic variation. Each context in D1 has translations in 7 languages, forming a 7-tuple (the rows in Table 3). We can count these translation tuples to assess the amount of variation. All translation tuples have low frequencies, so there is substantial variation in the combinations used across languages. In the set of 225 datapoints, the highest frequency of combinations of connectives across all seven languages is 6 (one tuple). We find another tuple with 5 occurrences, one tuple with 4 occurrences, three tuples with 3 occurrences, and 12 tuples with 2 occurrences; all other tuples are unique combinations of connectives.
The most common connectives for each language are collected in Table 4, where we use colours and symbols to indicate categories. For each language, connectives are sorted by frequency (note that there is no horizontal correspondence between the connectives in the columns, each column is an independent list). We distinguish three main categories. The first category is the combination of one or two negations with a temporal, possibly NPI, connective: NOT. . . UNTIL NPI •, NOT. . . UNTIL , NOT. . . BEFORE , NOT. . . AS LONG AS. . . NOT . The second category is the combination of a scalar or non-scalar focus adverb with a connective or preposition indicating temporal overlap or inclusion, temporal sequence ('after') or a condition, or a bare time adverb not introduced by a prepo-sition (ONLY/PAS. . . WHEN/IF/IN/AT/ON/ONCE/AFTER/-). We set aside the exceptive clauses (EXCEPT) as a third category.

Connectives and Polarity
Compositional semantics/pragmatics of the interaction between negation and connective is respected in all languages. Table 5 summarizes how connectives combine with negation or affirmation in the main and subordinate clause.   The main import of Table 5 is that most connectives exclusively combine with a single polarity pattern. All instances of UNTIL and BEFORE combine with negation in the main clause and an affirmative subordinate clause (NA pattern). Most instances of WHEN and IF combine with a focus adverb like ONLY or its scalar counterpart PAS, and have affirmative main and subordinate clauses (AA). The pattern AN is, with a few exceptions, not attested. Many of the exceptions have to do with problems of annotating negation across languages, such as dealing with lexical verbs that encode negation and expletive negation-see Sections 4.5.2 and 4.5.3 for further discussion.
According to Table 5, AS.LONG.AS frequently combines with negation in both the main and the subordinate clause (NN), but sometimes displays an NA pattern. The data extracted on the basis of Dutch zolang contain 13 constructions with an affirmative dependent clause (likewise there are 9 NA German solange cases, and 7 NA French tant que cases). An example of the NA pattern is given below: Deze oorzaken zijn gelegen in de kapitalistische productierelaties en kunnen niet opgeheven worden zolang deze relaties blijven bestaan.
These causes, which are rooted in the capitalist relations of production, cannot be eliminated as long as these relations exist.
Ces causes sont enracinées dans les relations de production du capitalisme et ne peuvent être écartées tant que ces relations existent. [French] All 8 instances of English as long as and so long as in the NA pattern in the D1 dataset are translations of the 13 NA constructions that have Dutch as a source language (hence English appears in the NA section of Table 5). English as long as almost exclusively occurs with an affirmative subordinate clause in the NA pattern (and AA, but those are not present in the dataset), whereas Dutch, German and French zolang/solange/tant que are more inclined to tolerate negative subordinate clauses (NN). Dutch NEG zolang NEG constructions are translated in English with a variety of other constructions such as not. . . until, not . . . if not. We conclude that English AS.LONG.AS has a polarity restriction that Dutch, German, and French AS.LONG.AS do not have. This might explain why AS.LONG.AS has not played a role in the formal semantic literature discussed in Section 2.1.
The polarity pattern of AA with ONLY.WHEN and ONLY.IF can be understood by considering the semantics of the focus particle only (or its scalar counterparts Dutch pas and German erst). The (enormous) theoretical literature on only states that only combines with a prejacent p, negates all non-entailed focus alternatives to p, and presupposes the truth of p (see Beaver and Clark 2008 for discussion and references). A number of technical issues arise in the analysis of only if conditionals when only and the meaning of the conditional are combined compositionally, but typically the semantic entry of only as described above is preserved (see Bassi and Bar-Lev 2018; Herburger 2019 for some recent work). We illustrate with (10), which asserts that Turkey cannot join under all non-entailing alternative conditions (they meet only some of the criteria, meet none of the criteria, etc.), and implies (presupposes) that Turkey can join if all the criteria are met.
(10) Turkey can only join if/when all the criteria are met.
Hence, the same meaning components in the not. . . until as described in Section 2.1 are encoded in only. . . if (the event taking place when the criteria are met, and the negative quantificational component that the event does not take place at times when the criteria are not met).
The marker UNLESS is infrequent in our dataset, but can be categorized as a conditional marker. English unless has been compared to and analyzed as only if not (Vostrikova 2018), another example of the lexical encoding of polarity and a focus particle.

MDS Maps
The stability of the polarity patterns can also be seen in the MDS maps in Figure 2, created in R (R Core Team 2020) using the smacof package. 4 Each symbol indicates a context from D1, and the measure of similarity used to construct the maps is based on both the connective and the polarity of the main clause. 5 The legend explains how the combination of color and shape indicates connective choice and the polarity of the main clause. Different colors correspond to categories of connectives: green for UNTIL, orange for WHEN, pink for AS.LONG.AS, olive for BEFORE, brown for IF, and red for AFTER (infrequent connectives are in gray). Shapes correspond to polarity: a square indicates affirmation, and a circle indicates negation. Each map in Figure 2 has the same configuration of points, but the colors/shapes reflect the language-specific marking ('map coloring' in the terminology of van der Klis and Tellings 2022).
The fixed color-shape combinations reflect the stability in compositionality: orange (WHEN) symbols are invariantly squares, green (UNTIL) symbols are circles, etc. Comparing the maps pairwise, it is also useful to see how colors change from language to language. Consider the large cluster of green UNTIL points on the left side in the map in English. Most other languages have a different dominant connective type than UNTIL, e.g., BEFORE in Finnish and AS.LONG.AS in Lithuanian (matching with Table 4). The UNTIL contexts in English are translated by these different markers, and this can be seen by the different coloring for the points in this cluster. We will postpone further analysis of clusters of symbols in the map, and the interpretation of the dimensions, until the analysis of dataset D2 in Section 5, as the larger number of languages in D2 facilitates this process.   Table 1 listed not only constructions with temporal connectives, but also conditional constructions (if, unless in English, si in French, etc.). The overlap of temporal and conditional meanings in the domain of 'not. . . until' was not taken into consideration in the earlier formal (de Swart 1996) or typological work (Wälchli 2018/2019), 6 but emerges from the corpus analysis.

Temporal and Conditional Meanings
The conditional construction comes with additional restrictions over the purely temporal ones. The Europarl conditional example in (4) (repeated as 11a) has future time reference, and this seems to be crucial for using (only). . . if. We can reformulate (11a) with not. . . until, as in (11b), but we cannot rephrase the constructed past tense example in (12a) with (only) if, as illustrated in (12b-d). (12b) and (12d) are unacceptable (#), unless a quantified, 'whenever' reading is intended.
(11) a. Europe must mobilise the Solidarity Fund and we know that if the budget is not approved, the fund cannot be mobilised. b.
The fund cannot be mobilized until the budget is approved. a.
The princess didn't wake up until the prince kissed her. b.
#The princess only woke up if the prince kissed her. c.
The princess only woke up when the prince kissed her. d.
#The princess didn't wake up if the prince kissed her.
The data in D1 were annotated for tense in both clauses, and we find that the IF cases generally occur with present tense on modal verbs (which have a future orientation, Condoravdi 2002), or with future tense. We conclude that with future time reference in both main clause and subordinate clause, the use of only. . . if is equivalent to not. . . until (11), but in the past domain, this equivalence does not work (12). The incompatibility of only. . . if with past tense in (12b/d) is due to additional grammatical properties of the conditional marker, namely that it introduces a hypothetical clause. This is incompatible with factual past events. When (12c) does not have this restriction, even though when and if are sometimes interchangeable in conditional constructions (Farkas and Sugioka 1983).
The formal semantic literature discussed by de Swart (1996) mostly discussed past tense examples such as (12a), so the role of tense has not come up so far. We can understand the conditional realization of the 'not. . . until'-meaning in the Europarl corpus by looking at the phenomenon of (political) linkage, discussed in Section 2.3. The 'not. . . until'construction conveys that a change from a negative to a positive phase occurs at the time of the occurrence of some event. In contexts with reference to future events, and in particular in the political language of Europarl, this event is typically the fulfilment of a condition for the event described in the main clause. Hence, linkage is both temporal and conditional in nature. In conditional clauses, the protasis usually comes before the main clause ('if p then q' ;Lehmann 1974;Comrie 1986), and a number of factors have been described that explain this ordering effect (Diessel 2001(Diessel , 2005. However, 'not. . . until' constructions in European languages are more often expressed in the order q p. Why is this? The temporal order established by linkage is actually p q (e.g. in example (2) in the introduction: first meet all criteria, then join the EU). However, the order p q emphasizes the perspective on event p ('linker-perspective'), and for establishing linkage, it is often more appropriate to emphasize event q ('linkee-perspective'). The linkee is interested in q, not in p, so in order to arouse the interest of the linkee it is therefore more useful to start with q, i.e. the desire of the linkee (or its denial) comes first. See Section 6 below for further comments about clause order.

Conclusions Based on D1
Dataset D1 confirms that compositional semantics of the NOT. . . UNTIL configuration is respected in all languages, so the NN, NA, and AA patterns we established for English in Table 1 are also found in other languages. The different translations are equivalent in context, if we take meaning to be the combination of truth conditions and presuppositions/implicatures. Much to our surprise, the cross-linguistic patterns were not as stable as the discussions in de Swart (1996) and Wälchli (2018Wälchli ( /2019 suggested. We expected languages to vary along the lines of the strategies outlined in Table 1, but we did not expect to find as much language-internal variation with as wide a spread as we observed in Table 4 and Figure 2. In order to achieve a better understanding on the amount of variation with respect to connective choice found in D1, we construct a new dataset D2 that is smaller in terms of parallel datapoints, is based only on a single initial search string (Swedish förrän) and is restricted to clause linkage (no PPs), but contains more languages. The increased number of languages makes it easier to find overall cross-linguistic patterns with statistical methods, and it allows us to replicate the findings reported in this section in a larger sample.
Before we move to D2, we discuss two practical issues we encountered in the annotation of negation in D1: lexical negation and expletive negation.

Lexical Negation
Negation can be expressed grammatically, but also by means of lexically negative verbs, such as English refuse in (13). However, this annotation leads to outliers in Table 1 and Figure 2, so in dataset D2 we made the choice to adjust the polarity, and annotate such configurations as NA. (13) . . . telling Italians that the waste problem had been resolved, the European Union is doing the right thing by refusing to grant Italy funding until an environmentally friendly waste system based on recycling of waste and composting has been presented.
Put differently, English refuse will be counted as negation in D2, even though for this particular example, 16 of 21 languages in D2 have a grammatically affirmative construction with some sort of 'refuse'-verb. While the majority of examples treated with polarity adjustment in our database contain clearly lexically negative verbs in the main clause, such as Italian bloccare, English fail or French déconseiller 'advise against', there are less straightforward examples, such as Latvian klusējāt [be.silent.PST.2PL] 'you were silent', corresponding to English you did not speak out. Lexical negation is no absolute phenomenon; it is always relative to a paraphrase with grammatical negation. Put differently 'be silent' is the lexically negative paraphrase of 'not speak out', but it need not be lexically negative in absolute terms.
Aside from clearly lexically negative verbs, another relevant group of examples are phasal verb constructions, such as French continuera de violer 'will continue to violate', which is lexically negative relative to English there is no end to the violations.

Expletive Negation
Expletive negation is a phenomenon in which negation does not get its normal truthconditional interpretation of logical negation. It appears in a variety of configurations, including comparative clauses, negative exclamatives, UNTIL-and BEFORE-clauses (Espinal 2000;Greco 2020, and references therein).
In many languages, UNTIL-and BEFORE-connectives can be combined with expletive negation, as illustrated from Italian and Latvian from the Europarl data in (14) and (15). Formally speaking, the polarity values are NN (negation in both main and affirmative clause), but based on the connective we would expect the configuration NA here (as in English not. . . before). A question discussed in the literature is whether expletive negation has any semantic functions, counter to what the name "expletive" suggests. Espinal (2000) argues for Catalan that expletive negation is sensitive to veridicality, so non-factual and potentially non-factual examples are more likely to bear a negation marker. According to Wälchli (2018/ The practical problem when working with corpus data from different languages is that the majority of examples does not contain any clues to distinguish expletive negation by formal criteria. We therefore decided to annotate expletive negation as N negative in both datasets D1 and D2. ) of two language families (Indo-European and Uralic). This sample may be considered small compared to some large-scale typological work, and also note that the standard varieties sampled can deviate from non-standard varieties in systematic ways (Murelli and Kortmann 2011). However, our aim is not to capture the entire world-wide diversity in the 'not. . . until' domain, but rather how language-internal and cross-linguistic variation interact. We think that for this purpose Europarl is an appropriate choice of corpus.

Data Extraction and Annotation
Dataset D2 is based on only one source language, Swedish (as compared to 3 source languages in D1), because the NPI förrän is the simplest and most straightforward diagnostic for the 'not. . . until' domain. We selected 79 Swedish sentences (out of a larger set of 203) with förrän as a clausal connective and few missing translations in the 20 other languages.
The examples sampled were annotated for 1.
Polarity of main and subordinate clause; and 3.
Clause order (subordinate clause post-or preposed) We adjusted annotation for lexical negation, as described in Section 4.5.2 above. In total, 36 instances of lexical negation with adjustment were registered, only 5 of which were from subordinate clauses, and of which 16 cases relate to a single example, viz. (13) discussed above.
The notion of connective was applied very broadly. Restrictive adverbs and particles, such as German erst and English only, were included if not adjacent, as well as temporal adverbs or correlative elements, such as German dann in (16). As before, all annotations were made manually. Below, we will represent letters with diacritics by upper case letters, and spaces by equal signs.
(16) German: coded connective erst=dann=wenn; polarity: AA; order: postposed Drittens sollten die Zeitintervalle erst dann beginnen, wenn alle Sprachversionen-ich wiederhole, alle Sprachversionen-zur Verfügung stehen. 'Thirdly, the periods should only start once all the language versions-I repeat, all the language versions-have been received.' As in Section 4, we used multidimensional scaling to create semantic maps (using the function cmdscale( ) in R). In addition, principal component analysis (R: prcomp( )) was used, applied directly to the binarized crosstable of connectives (see Jolliffe and Cadima 2016 for a background on principal component analysis). The two procedures yield largely the same results, but an important advantage of principal component analysis is that it also yields values for which connectives contribute the largest effect for the two poles of each dimension, which is very valuable for interpreting the dimensions. Figure 3 displays the MDS plots of the first two (the most informative) dimensions for a selection of 6 languages; maps for the other languages in the sample can be found in Appendix B. As in Figure 2, every symbol stands for a parallel example and the configuration of examples is the same for all languages. The maps in Figures 2 and 3 differ in two respects. First, the maps in Figure 3 are based on a distance measure that only takes connectives into account (as in the example in Table 3 discussed above). Second, the coloring scheme is different: in Figure 3, the colors do not correspond do not correspond with cross-linguistic categories, but with frequency. For example, in each map the red symbols indicate the most common connective in the language in question. , 'before' (such as Finnish ennen kuin) and 'as long as' (such as German solange) can be found.

Results for dataset D2
The discussion of the results will be divided in two parts. First, we see how D2 replicates our findings from dataset D1, reported in Section 4. Second, we will interpret the maps from Figure 3 by doing a dimension analysis (assigning a linguistic interpretation to the positive and negative poles of the two most important dimensions), and a cluster analysis (identifying and interpreting clusters in the maps).  The discussion of the results will be divided in two parts. First, we see how D2 replicates our findings from dataset D1, reported in Section 4. Second, we will interpret the maps from Error! Reference source not found. by doing a dimension analysis (assigning a linguistic interpretation to the positive and negative poles of the two most important dimensions), and a cluster analysis (identifying and interpreting clusters in the maps).

Replication of Results in D1
The dataset D2 with more languages replicates the high amount of language-internal variation with respect to connective choice. Except for Swedish (only one connective

Replication of Results in D1
The dataset D2 with more languages replicates the high amount of language-internal variation with respect to connective choice. Except for Swedish (only one connective Languages 2022, 7, 56 20 of 33 because this was a sampling criterion), we find that the most common connective (marked with red squares) occurs, on average, only 35 times out of 79. Hence in many languages, there is no clear dominant strategy. As the legends in the plot indicate, even in the smaller set of 79 datapoints, we find 8 to 20 different strategies in each language.
The cross-linguistic stability of the combination of connectives with polarity patterns is confirmed in dataset D2. Figure 4 uses size of circles-in the same configuration as the MDS maps in Figure 3-to indicate the frequency of polarity values averaged for all languages. The cluster in the bottom right can be identified as ONLY.WHEN (further discussion of clusters below), which shows a high proportion (large circles) in the affirmative-affirmative (AA) plot. The top right cluster can be identified as the IF cluster, and this has a large proportion of NN patterns. These correspond with our earlier findings (see Section 4). The cross-linguistic stability of the combination of connectives with polarity patterns is confirmed in dataset D2. Figure 4 uses size of circles -in the same configuration as the MDS maps in Error! Reference source not found. -to indicate the frequency of polarity values averaged for all languages. The cluster in the bottom right can be identified as ONLY.WHEN (further discussion of clusters below), which shows a high proportion (large circles) in the affirmative-affirmative (AA) plot. The top right cluster can be identified as the IF cluster, and this has a large proportion of NN patterns. These correspond with our earlier findings (see Section 4). We now take a closer look at the configuration of dots in the maps in Error! Reference source not found..

Dimension Analysis
The functional domain that Error! Reference source not found. visualizes is seman tically very narrow. Recall that it was selected on the basis of identical marking in one language, Swedish. Thus, for Swedish, all dots have the same color and the legend con tains a single item: förrän. It can therefore be expected that semantic differences will no be particularly strong signals in the dataset and indeed the most dominant signal in Di mension 1 is the degree of conventionalization of a dominant marker. All languages in Error! Reference source not found. have red dots (always the most frequent connective topping the legend) in the negative pole of Dimension 1 in the most crowded area of the space. The plot in Figure 5 indicates by means of size of circles how rare connectives are on average across all languages in the dataset. We see that the smallest dots on the nega tive pole of Dimension 1 indicate the most prototypical 'not...until' contexts, where lan guages tend to use their most frequent markers. These markers also peak the set of mark-  We now take a closer look at the configuration of dots in the maps in Figure 3.

Dimension Analysis
The functional domain that Figure 3 visualizes is semantically very narrow. Recall that it was selected on the basis of identical marking in one language, Swedish. Thus, for Swedish, all dots have the same color and the legend contains a single item: förrän. It can therefore be expected that semantic differences will not be particularly strong signals in the dataset and indeed the most dominant signal in Dimension 1 is the degree of conventionalization of a dominant marker. All languages in Figure 3 have red dots (always the most frequent connective topping the legend) in the negative pole of Dimension 1 in the most crowded area of the space. The plot in Figure 5 indicates by means of size of circles how rare connectives are on average across all languages in the dataset. We see that the smallest dots on the negative pole of Dimension 1 indicate the most prototypical 'not. . . until' contexts, where languages tend to use their most frequent markers. These  Table 6.
(17) English: most extreme ONLY.WHEN example (bottom right) [AA, postposed] We will only be able to consider it actually over when employment has returned to pre-crisis levels.
(18) English: most extreme IF example (top right) [NN, postposed] Mr. President, we will not achieve the objectives of the Europe 2020 strategy, neither will we make the economy more innovative and competitive, if we do not treat the Single Market holistically. (17) English: most extreme ONLY.WHEN example (bottom right) [AA, postposed] We will only be able to consider it actually over when employment has returned to pre-crisis levels.
English: most extreme IF example (top right) [NN, postposed] Mr. President, we will not achieve the objectives of the Europe 2020 strategy, neither will we make the economy more innovative and competitive, if we do not treat the Single Market holistically.

Cluster Analysis
Clusters can be identified by visual inspection of the maps, but a more systematic method to group examples into clusters is partitioning (for more on clustering analysis methods, see van der Klis & Tellings 2021). In Table 7, we use Partitioning Around Medoids (R: pam( )) with 5 clusters. This method clearly sorts out IF (Cluster 5), ONLY.WHEN (Cluster 4), and BEFORE (Cluster 3), whereas UNTIL and AS.LONG.AS cannot be easily split up by this method, perhaps because there are many conventionalized connectives which are both 'until' and 'as long as'. Clusters 1 and 2 do not distinguish UNTIL and AS.LONG.AS. Five rather than four clusters are used, because BEFORE does not appear as a cluster with k=4.
The location on the configuration of the MDS plot of the five clusters singled out by pam( ) with k=5 is shown in Figure 6.

Cluster Analysis
Clusters can be identified by visual inspection of the maps, but a more systematic method to group examples into clusters is partitioning (for more on clustering analysis methods, see van der Klis and Tellings 2022). In Table 7, we use Partitioning Around Medoids (R: pam( )) with 5 clusters. This method clearly sorts out IF (Cluster 5), ONLY.WHEN (Cluster 4), and BEFORE (Cluster 3), whereas UNTIL and AS.LONG.AS cannot be easily split up by this method, perhaps because there are many conventionalized connectives which are both 'until' and 'as long as'. Clusters 1 and 2 do not distinguish UNTIL and AS.LONG.AS. Five rather than four clusters are used, because BEFORE does not appear as a cluster with k=4.
The location on the configuration of the MDS plot of the five clusters singled out by pam( ) with k=5 is shown in Figure 6.
In sum, the methods of multidimensional scaling, principal components and partitioning provide evidence that the functional domain of NOT. . . UNTIL is internally structured and not subject to entirely free variation. Only connectives with more than one occurrence per cluster are listed.  In sum, the methods of multidimensional scaling, principal components and partitioning provide evidence that the functional domain of NOT...UNTIL is internally structured and not subject to entirely free variation.

Discussion
In this section, we address the research questions we formulated in Table 2 using the combined insights gained from datasets D1 and D2. We end with a brief discussion of the way our results fit into the larger context of cross-linguistic research by considering some additional issues.

Q1: Various strategies
D1 displayed much more language-internal variation than we expected, and this variation was replicated in D2. All strategies corresponding to the paraphrases in Table 1 were found in the cross-linguistic parallel text data, even though only three strategies were reflected in the search string for D1 and none in D2 (with Swedish förrän not corresponding to any of the strategies in Table 1).
However, variation is not infinite: (i) We find predominantly UNTIL, BEFORE and AS.LONG.AS; (ii) There are a few cases of alternative strategies revealing an extension into the domain of conditionals ((ONLY). . . IF next to ONLY. . . WHEN/ONLY. . . AFTER) and in the domain of exceptive clauses (NOT. . . WITHOUT); 8 (iii) Although none of the languages under investigation uses a single strategy to convey the 'not. . . until'-meaning, we find some languages in which one or two forms are used as dominant strategies.

Q2: Interaction of connectives and polarity
The interaction of connective type and polarity largely corresponds to the expectations we set out from in Table 1. There are exceptions, but they are all systematic. They can be accounted for by lexical negation (Section 4.5.2) and expletive negation (Section 4.5.3). Expletive negation is restricted to the UNTIL-and BEFORE-strategies. A third arguable type of exceptions concerns 'only' containing a negative element, such as French ne. . . que 'only', as part of the ONLY.WHEN-strategy.

Q3: Language-internal and cross-linguistic variability
Section 5 based on D2 suggests that variability across the 'not. . . until' domain can be explained by systematic language-internal and cross-linguistic variability. Two strategies, (NOT.)IF and ONLY.WHEN are mainly due to language-internal differences. Minor exceptions are German, Dutch and Danish, where the ONLY.WHEN-strategy is slightly more common than in other languages (which is no surprise for Dutch and German, see Section 2.1). Some languages have one marking strategy that occurs in more than 50% of the data in D2. This is either BEFORE (Finnish and Danish), a specific dedicated marker (Swedish förrän, diachronically deriving from BEFORE), UNTIL (English and Spanish) or an underspecified AS.LONG.AS/UNTIL marker (Bulgarian, Slovene, Czech and Slovak). Other languages, including Portuguese, Latvian, and Polish, are mixed. So far the results are largely the same as in Wälchli (2018Wälchli ( /2019 for data from the New Testament. Our results differ, however, for French, Dutch, German and Estonian, which are also mixed in the Europarl data. Notably, the relevance of the AS.LONG.AS-strategies in French (tant que) and Dutch (zolang) was entirely missed in both de Swart (1996) and Wälchli (2018Wälchli ( /2019. As expected (see Section 4.5.3), expletive negation only occurs in BEFORE-, UNTIL-and underdifferentiated AS.LONG.AS/UNTIL-connectives.
In the data considered, conventionalization (dominant markers) is so strong that it is the major signal in the multidimensional scaling and principal component analyses. Hence, it is safe to conclude that a large part of the 'not. . . until' domain is strongly conventionalized in European languages, but all languages also have less conventionalized parts where language-internal variation occurs. Our results demonstrate that the encoding of the 'not. . . until' domain can only be properly understood if cross-linguistic and languageinternal variability are both taken into account at the same time.
We have shown that the various strategies in Table 1 are not entirely synonymous. Yet, we cannot say either that different markers in the 'not. . . until' domain in European languages have neatly distinct meanings. There are no strict semantic borders across the domain and thus no strict absence of synonyms. The various strategies can safely be considered to be near-synonyms since the semantic differences between them are entirely gradual, they differ in meaning only as a tendency. Two strategies, the ones at the extreme poles, IF and ONLY.WHEN, are somewhat more different, BEFORE, UNTIL and AS.LONG.AS are overlapping to a larger extent. We have attested both "underdifferentiation" and "overdifferentiation" in this domain: Underdifferentiation: In some languages, not all strategies can be distinguished. Lithuanian kol, for instance, means both 'as long as' and 'until'. Hence, the two strategies UNTIL and AS.LONG.AS are not easily distinguished.

Q4: Relationship between temporality and conditionality
Our results show that the 'not. . . until' domain not only hosts temporal, but also conditional connectives. Given the nature of political linkage discussed in 2.3, this need not come as a surprise. Dimension analysis in Section 5 indicates that we can map the data on a scale of more temporal expressions (e.g., UNTIL) vs. more conditional expressions. There is no strict borderline between temporal and conditional meaning in this domain, as we can easily understand a phase change of an eventuality e 1 at the time of another eventuality e 2 , as e 2 being a condition for the occurrence of e 1 .

Further Issues
Research questions Q1-Q4 do not in any way exhaust the range of issues that could be picked up. We illustrate this with a brief note on the order of main and subordinate clauses in the constructions under investigation. Figure 7 shows the ratio of initial subordinate clauses averaged through all 21 languages of the D2 sample by size of circles. As can be seen, final word order strongly prevails and there is no obvious pattern of distribution of deviant initial order across the clusters. As many as 44 contexts never have initial order in any translation and the maximum value is 0.9 (there is no context where all languages have initial subordinate clauses). These findings agree with our hypothesis that speakers typically put the `linkee'-perspective first in a configuration of linkage (see Section 2.3). However, it may also be the case that word order preference is biased by the choice of initial search strings.
Other questions not addressed in this paper are the relationship of 'not...until' and tense and aspect forms in main and subordinate clauses and the great variability of expressions in different languages used in the ONLY.WHEN-strategy.

Conclusions
In this study we have investigated the expression of 'not...until' in the Europarl parallel corpus in two datasets: D1 (7 languages and 225 datapoints) and D2 (21 languages and a more restricted set of 79 datapoints). We set out with a set of paraphrases and our research questions concerned the ways these strategies are reflected in the cross-linguistic As can be seen, final word order strongly prevails and there is no obvious pattern of distribution of deviant initial order across the clusters. As many as 44 contexts never have initial order in any translation and the maximum value is 0.9 (there is no context where all languages have initial subordinate clauses). These findings agree with our hypothesis that speakers typically put the 'linkee'-perspective first in a configuration of linkage (see Section 2.3). However, it may also be the case that word order preference is biased by the choice of initial search strings.
Other questions not addressed in this paper are the relationship of 'not. . . until' and tense and aspect forms in main and subordinate clauses and the great variability of expressions in different languages used in the ONLY.WHEN-strategy.

Conclusions
In this study we have investigated the expression of 'not. . . until' in the Europarl parallel corpus in two datasets: D1 (7 languages and 225 datapoints) and D2 (21 languages Languages 2022, 7, 56 25 of 33 and a more restricted set of 79 datapoints). We set out with a set of paraphrases and our research questions concerned the ways these strategies are reflected in the cross-linguistic corpus data, how the different strategies interact with polarity, to what extent diversity is constrained cross-linguistically and language-internally and what the interplay of temporality and conditionality is in the 'not. . . until' domain. In both datasets we found that languages have a bewildering wealth of different constructions to convey the 'not. . . until'meaning. Further analysis of dataset D2, based on analysis of clusters and dimensions in semantic maps created by MDS, reveals that this variation is neither unlimited, nor purely a matter of free variation. We were able to identify clusters of meaning corresponding to BEFORE, IF, and ONLY.WHEN, as well as a cluster of highly conventionalized expressions of the 'not. . . until'-meaning. The interaction between connectives and polarity is stable in the sense that cross-linguistically, categories of connectives combine with a single polarity pattern in the main and subordinate clauses (see Table 5), unless there is a specific reason for deviation, such as expletive negation. This aligns with predictions from the formal literature that different semantic encodings of the 'not. . . until'-meaning are semantically/pragmatically equivalent, but originate in different lexicalizations of the construction (de Swart 1996). We have thus shown how an analysis of parallel corpus data can verify predictions about meaning composition made in the semantic literature. However, the corpus data do not only confirm earlier predictions, they also expand our perspective. Many examples deal with possible future events in terms of linkage expressing a crossover of interests and control, contexts which have so far been largely ignored in the semantic literature.
To summarize, what we find is much more diversity than expected from earlier semantic and typological literature, but also some very clear trends how diversity is constrained both cross-linguistically and language-internally. Despite some obvious methodological difficulties in using translation data, we cannot see any way in which the results we obtained could be reached by other methodologies. Our study demonstrates that cross-linguistic corpus research is indispensable in semantic studies. Semantic studies cannot abstract from cross-linguistic and language-internal diversity before having controlled for it, which presupposes empirical cross-linguistic and corpus research.

Appendix A
Below is a list of connectives attested in the D1 dataset and their categories in the various languages.  Some annotation conventions: for symbolizing polarity, we always use the order main clause -subordinate clause even if clause order is the opposite, as in (4). Negative connectives such as without and unless are coded as heading negative clauses. Multiple negative elements within a single clause are counted as contributing single negation in negative concord languages (e.g. French ne...pas). In some languages, the counterpart of only contains a (formally) negative particle, e.g. French ne...que; such expressions are coded as ONLY, and the clauses they occur in are coded as affirmative.

2.
In Europarl it is not always clear what language is the original source language, and what language is the translated language (but see van Halteren 2008). We do not consider this to pose a major problem for our approach, since we are primarily interested in the inventory of expressions available in various languages.

3.
TimeAlign is software that facilitates the annotation of parallel corpus data, and is available at https://github.com/UUDigitalHumanitieslab/timealign. Notes 1 Some annotation conventions: for symbolizing polarity, we always use the order main clause-subordinate clause even if clause order is the opposite, as in (4). Negative connectives such as without and unless are coded as heading negative clauses. Multiple negative elements within a single clause are counted as contributing single negation in negative concord languages (e.g. French ne. . . pas). In some languages, the counterpart of only contains a (formally) negative particle, e.g. French ne. . . que; such expressions are coded as ONLY, and the clauses they occur in are coded as affirmative.
2 In Europarl it is not always clear what language is the original source language, and what language is the translated language (but see van Halteren 2008). We do not consider this to pose a major problem for our approach, since we are primarily interested in the inventory of expressions available in various languages. 3 TimeAlign is software that facilitates the annotation of parallel corpus data, and is available at https://github.com/UUDigitalHumanitieslab/ timealign. 7 See https://termcoord.eu/2016/09/eu-languages-maltese-and-irish/. 8 The number of exceptive clauses in our corpus data is low, so further investigation is needed, but this falls outside the scope of this paper.