Tadeusz Chrostowski (1878–1923) is one of the expeditioners mentioned in the Brazilian Ornithological Gazetteer. According to Wikipedia, he conducted three expeditions in Brazil during the period 1910–1923. His first expedition took place in the year 1910 along the River Iguaçu after which he returned to Poland in 1911; his second expedition ran from 1913 to 1915, and then he returned to Poland in 1915, due to the news of the outbreak of World War I. In [26
], it is mentioned that Chrostowski conducted his third expedition from 1921 to 1923. However, after extracting the spatiotemporal information from the 58 entries where his name has been mentioned, we were able to produce six expedition routes with a temporal gap
of two months between two consecutive expeditions (see Table 3
). As the Table shows, Expeditions II, III and IV and Expeditions V and VI are close to each other as measured in time. Based on this closeness, we suggest the following aggregations to arrive at three expedition routes only.
The Third Expedition of Chrostowski: 1921–1923
Straube and Urben-Filho mention that Chrostowski’s third expedition was carried out from 1921–1923 [26
]. We used the expedition gazetteer texts that mentioned his name to experiment on our spatiotemporal information extraction framework. Accordingly, we managed to extract the spatial and temporal elements from the given text and created spatial features represented by the point data type using the extracted spatiotemporal information, and then we mapped the trajectory of the expedition route by connecting the extracted points chronologically. According to the extracted information, this expedition consists of 39 (see Table 4
) site visits, out of which only 35 were identified as distinct site visits, meaning the remaining four visits were made to identical locations at different times (see the purple, blue and green colored records in Table 4
), or these visits might have been mislabeled.
To assess the reliability of the extracted spatiotemporal information, we developed a simple algorithm that calculates the distance between two chronologically consecutive site visits and compares the result with an average distance and the temporal gap
we predefined; here it is worth noting that setting the average distance and the temporal gap
was subjected to the assumption made by us that people back in the 1920s could travel 30 km in a day. Then we conducted the assessment setting the average travel distance to 30 km a day, meaning if the distance between two site visits is more than 30 km and the temporal gap
is less than a day, then the route is considered unreliable. After running the algorithm on the 35 distinct site visits, the routes through records 33 (light blue–colored), 17 (orange-colored) and 38 (light blue–colored) of Table 4
came up as unreliable. At this point, a manual intervention was necessary to investigate the unreliability. Hereafter, let records 33 (light blue–colored) be point A, 17 (orange-colored) point B and 38 (light blue–colored) point C for the sake of simplicity.
The distance measured as the crow flies from point A to B is about 535 km whereas the distance from point B to C is another 531 km. The expeditioner visited these points as follows: point A (1923/01/23–1923/02/26), point B (1923/01/25) and point C (1923/02/27–1923/03/16) (see Figure 13
). Assuming the distance that could be covered is 30 km a day, the expeditioner must have traveled for 18 days from point A to B and another 18 days from point B to C. However, the story we see in the extracted information is that the expeditioner traveled through these points in a single day, which is very unlikely to have happened back in 1923, assuming that expeditioners back in those days traveled on horseback. Such inconsistency may be related to mislabeling or extracting wrong information. Considering these, the followings are possible scenarios.
Scenario one: The extracted information might actually be of the right expeditioner, but the description could be of another expedition, for instance instead of 1923/01/25, the visit date might have been 1921/01/25.
Scenario two: The extracted information might be of the right time and expeditioner, but the problem could be the extracted location. In this case, if point B were near points A and C, we could have believed that the route via these points is reliable.
If the unreliability of the route under assessment has the cause of scenario one
, the right solution seems to check if the other expedition route has an attributive, temporal and spatial intersection at the outlier point (point B) of the assessed expedition. Figure 13
shows the three expedition routes of Chrostowski; the same figure shows the spatial intersection between Expedition III (red line) and Expedition II (blue line). Looking at the intersection points, we can infer that these two points could be extracted from an identical location description and they share an identical location. Hence, the reason that point B of the assessed expedition route is an outlier must be either due to extracting the triplets from the wrong location description or having a wrongly written description. To support this claim, we have to look at the description from which the triplets are extracted. The paragraph below is the same visit description from which the information is extracted. According to this description, the outlier point is extracted correctly; “25 January 1923
” is, of course, there. In the same description there is a phrase that reads “although it was not mentioned by Chrostowski
”; here we have to be suspicious about the credibility of “25 January 1923
”. Therefore, the author of this description might have made an attributive misreporting. Assuming point B was completely an outlier and may belong to another expedition (Chrostowski’s Expedition II in this case), we excluded it from the assessed expedition route, and we modified the expedition route by connecting points A and C. Figure 14
shows this modified expedition route.
“Ca. 900 m, on S side of Rio Iguassu [Rio Iguaçu, 2536/5436 (USBGN)], ca. 12 km SE of Curitiba [2525/4915 (USBGN)], Chrostowski, 22, 31 January, 11, 14, 19–20, 22 February, 15 March 1914, 10 February 1915[?], 25 January 1923 (Chrostowski, 1921:31–34, as “Affonso Penna”; 1922, Ann. Zool. Mus. Polonici Hist. Nat., 1:400, as “Affonso Penna”; Sztolcman, 1926:119); description places this near São José dos Pinhais [2531/4913 (USBGN)], although it was not mentioned by Chrostowski.”
shows two expedition routes; the red line shows the route connected by straight lines passing through each point, and the blue line shows the same route connected by the road network (for convenience sake, we used the present-day way-finding tool of Google Earth) passing through each point. The expedition route depicted by the blue line is considered as a reasonable representation of the third expedition conducted by Chrostowski during the period 1921–1923. We compared this route map visually—manual intervention was necessary at this point—with a reference map of the same expedition that was prepared manually by one of Chrostowski’s friends, Jaczewski, in 1925. The objective of this visual comparison is to confirm whether the framework is reliable and the spatiotemporal information is extracted correctly. The centers of this comparison are geometrical and spatial situation similarities between the route we produced and the reference route. Figure 15
shows the reference route; we colored the original reference route as blue to enhance its visibility and make the visual inspection easy, and Figure 14
shows the expedition route we extracted from the expedition gazetteer text. The figures show that the expedition routes in both cases are geometrically and spatially similar. The resemblance of these routes gave us a compelling reason to believe that our framework is reliable and can be used to extract spatiotemporal information from similar expedition gazetteer texts. Note: while reviewing the biography of the expeditioner (Chrostowski), we discovered a surprising fact. According to [27
], Chrostowski died on 4 April 1923. However, the spatiotemporal information we extracted from his expedition gazetteer text shows that his last visit was conducted from 5 May 1923 to 4 July 1923 (see Table 4
, record 4), which contradicts the fact that he died prior to this very visit. The only possible explanation to this contradiction is misreporting the site visit, which might have happened while the gazetteer was written.