class: center,middle # How to Catch when Proxies Lie ##
Verifying the Physical Locations of
Network Proxies with Active Geolocation
.authors[
Zachary Weinberg
·
Shinyoung Cho
Nicolas Christin
·
Vyas Sekar
·
Phillipa Gill
]
invisible
italic
and
bold
text to force fonts to load
.institutions[
] ??? Good afternoon. I’m Zack Weinberg, and I’m going to talk about what you can do when you suspect your VPN servers aren’t where the VPN company says they are. I’m a PhD student at Carnegie Mellon’s CyLab. This is joint work with two of the CyLab faculty, Dr. Nicolas Christin and Dr. Vyas Sekar, and also with Shinyoung Cho and Dr. Phillipa Gill from ICLab at the University of Massachussetts. --- # Implausible claims
??? This is a verbatim quote from a major commercial VPN service’s website. I can’t fit all “190+ countries” on this slide, so I’m only showing the “Asia Pacific” subscreen of their list. This list includes countries where it seems like it would be extremely difficult to rent space in a data center, sometimes for political reasons—North Korea—or more often because they’re tiny islands with fewer than 5000 inhabitants and I doubt they have enough bandwidth to support a data center. But then, if we don’t trust those claims, why should we trust any of the others? --- # Implausible claims, audited
??? To spoil my own punch line, on the left, we have all of the countries where that service says they have servers. And on the right, where my audit says they really are. Darker green means more server IP addresses in that country. The rest of this talk is about how I know that. But first I want to mention that this is an economically rational thing for the VPN company to have done. Most of their customers, when they select a VPN server in Ruritania, what they probably want is for websites to _think_ they’re surfing from Ruritania. Because Ruritania’s doing really well in the World Cup, and they want to watch, but the national TV channel only offers streaming video to people in the country. That’s usually enforced with IP-to-location databases, but those are full of errors, and a lot of the sources they use could be faked. Whois, address registry allocations, airport codes in routers’ DNS names, that sort of thing. So, supposing the VPN company has a way to fake server locations in IP-to-location databases, then they can use that to offer lots more locations on paper, while maintaining only a few actual data centers, which obviously saves them money, and it also lets them locate those data centers in countries with cheap, reliable bandwidth, so their service comes off looking good on the benchmarks too. But I’m not subscribing to VPNs to watch TV. I’m trying to monitor Internet censorship around the world, and it’s no good to me if the server that’s supposedly in North Korea is anywhere else. Even if it was right next door, in China or South Korea—both countries that also do Internet censorship—that would screw up my data collection. --- # Active geolocation
??? How can we find out where proxy servers really are, without trusting any information that could be faked? The basic technique has been around for decades, it’s called active geolocation. It works on the same principle as the Global Positioning System, but instead of radio waves we use packet travel times. We have _landmark_ hosts in known locations, say Bourges in France, Cromer in the UK, and Randers in Denmark, we ping the _target_ host from each, we find out it’s within 500 km of Bourges, 500 km of Cromer, and 800 km of Randers, we draw disks on the map and we find out it’s gotta be in Belgium. Or maybe a couple places in southeastern Great Britain. We’re assuming it’s not on a disused anti-aircraft platform in this wedge of the North Sea here. The basic _problem_ with active geolocation is, radio waves travel in straight lines at a constant velocity, but packets don’t. We may have routing delays, and we may also have what’s called “circuitous” routes, major detours from the great-circle distance—I’m told that packets often get routed from Australia to Japan by way of California, because those cables have more capacity. That’s 21 thousand kilometers’ worth of extra latency. On the right I’ve plotted the relationship between delay and distance for pings from one of the landmarks I used to all of the others, and you can see there is a relationship but it’s messy. For twenty years researchers have been experimenting with models for this relationship. CBG is one of the oldest and simplest. Linear travel time estimate, disks, geometric intersection. You can get much fancier than this. --- # Octant
??? For instance, Octant estimates the minimum as well as the maximum travel time, using piecewise linear estimates based on the convex hull of the points on the scatterplot. --- # Spotter
??? And Spotter draws probability density functions instead of flat shapes on the map, based on cubic polynomial regression on the delay-distance relationship. The papers describing the fancier algorithms often compare back to CBG and claim some percentage reduction in the uncertainty of the estimate, over the same test set. But the catch is they’re all tested on North America or Europe, and often only on PlanetLab nodes, which may have better connectivity than average for that area. There’s reports that minimum distance estimates are unsound in China, because its network is always congested, so it’s not safe to say “this packet must have traveled at least this distance.” And hardly anyone has tested active geolocation on hosts that could be anywhere in the world. --- # Testing active geolocation around the world
??? So I did that test. I measured the precision of CBG, Octant, Spotter, and an Octant/Spotter hybrid—cubic regression but geometric intersection—on test hosts all around the world. I should mention that I had to reimplement all the algorithms myself from the descriptions in their papers, and Octant in particular got changed a bunch because it originally used traceroute information but we couldn’t collect traceroutes. So I’m going to be calling that Quasi-Octant from now on. RIPE is the European IP registry and coordination forum, they run a measurement constellation called Atlas. There’s two classes of measurement hosts, anchors and probes. Anchors are beefier hardware and guaranteed to have stable IP addresses. Probes aren’t guaranteed to have stable addresses but a lot of them do. All the hosts have documented physical locations, and are pingable. They’re ideal for use as landmarks. Their coverage outside of Europe could be better. For comparison, on the right is the Center for International Earth Science Information Network’s estimate of world population density as of 2015. Even if you scale that by Internet access, there’s still a huge discrepancy. There are measurement constellations that do better on North America, like CAIDA Ark, but I haven’t found one that does better on Latin America, Africa, or Asia. But there’s enough worldwide coverage to make this worth trying, at least. --- # Testing active geolocation around the world
??? We calibrate all our algorithms on ping times from landmarks to landmarks, so we need a second set of hosts to be testing targets. We crowdsourced these. 40 from volunteers, 150 more from Mechanical Turk. I was complaining about RIPE Atlas not having enough hosts in Latin America, Africa, and Asia, but it’s hard for a researcher based in the USA to get volunteers from there, too. Mechanical Turk lets you request workers from specific countries, which I used to prevent India and the USA consuming my whole budget for this, but in many countries I didn’t get any workers at all. But, again, there’s enough to tell us something. --- # Measuring RTT with a Web app
??? We couldn’t measure round-trip times with ordinary ping packets. The proxies we ultimately want to investigate are behind aggressive ingress filters, and our crowdsourced measurements had to be done with a Web application, which has a lot of restrictions on how it can talk to the network. All the gory details are in the paper, but the short version is, we have to use TCP handshakes, on a well-known port. The client we used to study the proxies is written in C and can just connect to port 80 and time how long it takes the SYN-ACK to come back. One round trip. The Web application, on the other hand, has to pretend to be doing a legitimate HTTPS request. HTTPS to port 80 forces a protocol error, because the client’s speaking TLS and the landmark is expecting unencrypted HTTP. One or two round trips, depending whether the landmark is listening on port 80 at all; we don’t know that in advance, so we can’t tell which. So our crowdsourced tests are somewhat less precise across the board than the measurements of proxies. Please keep that in mind. --- # Measurement tool comparison
??? Now you are probably wondering if we can trust the crowdsourced measurements at all? Well, the good news is, if you’re on Linux, and you get lucky and only measure one round-trip with the web app, your RTT measurements are not statistically distinguishable from the command-line tool’s measurements, and it doesn’t matter which browser you use. On Windows it’s a different story. The command-line tool hasn’t been ported to Windows, but I tried the web app in four different browsers, and _all_ of them are significantly slower than the same one on Linux — this is all the same computer, by the way — and it’s not just additional fixed overhead, the “one round trip” line in the middle plot has roughly the same slope as the “two round trips” line in the left plot. And they’re also much noisier. The entire right plot is rejected outliers from the same measurements as the middle plot. That dashed line is the cutoff, it has the same absolute slope on both. We deliberately didn’t record the operating system that our crowdsourced testers were using, because we wanted to know as little about them as possible—they’re telling us their physical locations—but, you know, good odds many of them were on Windows. --- # Algorithm comparison
??? So now that I’ve given you some reasons not to trust this comparison, here is how well the four geolocation algorithms, CBG, Quasi-Octant, Spotter, and the hybrid, do at predicting the locations of the crowdsourced test targets. We didn’t try to compensate for any of the problems I talked about in the last two slides at all, we wanted to see if the algorithms themselves were robust enough to cope. These are all empirical cumulative distribution plots. For instance, this purple dotted line here on the left crosses one-half on the y-axis at ten thousand on the x-axis. That means, for half of the crowdsourced targets, Spotter’s prediction region’s nearest edge was less than ten thousand kilometers from the true location. … That’s actually very bad for Spotter. For what I’m doing, I want the true location to be _inside_ the prediction region, always. That way I can say “well, I’m not sure where inside this area the target is, but I can be absolutely sure it’s _not_ outside this area.” I want an algorithm that turns in a perfect performance on this plot: straight up from zero to one hundred percent at distance zero. None of the algorithms get there, but CBG comes closest. Incidentally, twenty thousand kilometers is half the circumference of the Earth, so, Spotter’s as bad as it could possibly be for nearly a third of the targets. In the middle we are looking at the distance from the _centroid_ of the prediction region to the true location. None of the algorithms do especially well at this, and none of them do any better than the others. And on the right, we are looking at how big the prediction regions are. Obviously you want it to be as small as possible. A prediction that covers the entire land area of the Earth is saying “I have no idea where this is.” But for what I’m doing, that’s a better kind of failure. I would rather say “I don’t know where these VPN proxies are” than say “I am certain it is over there” and be wrong. Notice that performance here is exactly reversed from performance on the “distance from edge to location” plot. The algorithms with bigger prediction regions are more likely to hit the target. That might sound obvious, but put it together with the big difference between CBG and the others: CBG doesn’t use a minimum distance estimate. And put it together with the method problems, the Web app possibly measuring two round-trips instead of one, and Windows producing larger travel times than Linux on the same computer. I suggest what this tells us is CBG is more robust when the measurements might overestimate the distance to the target. This is what’s happening with the crowdsourced measurements, but it’s also what was reported to happen in China because of congestion, and going between Australia and Japan because of circuitous routes. --- # Underestimation and CBG++
??? We looked for some small modifications we could make to CBG that would prevent it from ever missing the true location, perhaps at the expense of sometimes producing an even bigger prediction region, and we found them. The key insight is that CBG can only fail to hit the true region when one or more of its disks _underestimate_ the distance a packet could have traveled. In fact, an underestimate can make it not produce any prediction at all—on the left, adding a measurement from Liechtenstein, the pink disk, that underestimates the distance means that there _is_ no region overlapped by all four disks. If it were just a little bigger, but still not big enough, it would be telling us that the server had to be farther to the southeast than it really was. Since we know the true locations of all the crowdsourced test hosts, we can calculate how often underestimates happen. The plot on the right shows the distribution of ratio between the estimated and the true distance, for eight bins of the true distance. Overall, about 1% of disks are underestimates. Our modified CBG, CBG plus plus, has its prediction region be the region overlapped by the largest possible subset of all the disks. If there are two or more overlaps among the same number of disks, we take the bigger one, so in the example we would throw out the pink disk and produce the same prediction as before. That ensures the prediction is always nonempty. To stop it being nonempty but still too small, we put a lower speed limit on the slope of CBG’s bestline, basically saying “we know it’s physically possible for a packet to travel half the circumference of the Earth in 240 milliseconds, so all bestlines must predict it could have gone at least that far in that time.” We retested and sure enough, those two changes eliminated all of the misses. So we used CBG++ for the main study, geolocating VPN proxies. --- # Measurement through proxies
??? I need to tell you about a few more small details before we get to the results. First, there’s yet another hurdle to cross to be able to geolocate VPN proxies. The measurement we need for CBG’s disks is _A_ on the left diagram the round-trip time between the proxy and each landmark. But we can’t measure that directly, because we can’t run code on the proxy itself, and it doesn’t respond to pings. What we _can_ measure is _B_, the round-trip time from our client _through_ the proxy to each landmark, and also _C_, the round-trip time from our client through the proxy and back to the client and then back again. In an ideal world, _A_ would be equal to _B_ minus half of _C_, because _C_ goes back and forth between the client and the proxy twice. A few of the proxies _can_ be pinged, so we use that to check this equation, and it holds up: linear regression says 0.49_C_ with R-squared greater than point nine nine. This is roughly the same trick that Castellucia et al used to geolocate botnet command and control servers back in 2009, by the way. --- # Performance optimization
??? There’s 260 landmarks, pinging all of them takes a while. But most of the measurements are ineffective, they produce a massive overestimate of the possible distance, that doesn’t reduce the size of the prediction region at all. Clearly taking those measurements was a waste of time. Measurements from nearby landmarks are more likely to be effective, so we do two phases of measurement: first we use a few landmarks on each continent—one would be enough, but we have three in case some of them are down—to determine which continent the target is probably on, and then we use a random subsample of 25 landmarks _on_ that continent to narrow down the location. It’s actually a little cleverer than that, it picks landmarks in or nearby the prediction region from the continent phase. This cuts the time to scan one target from about ten minutes to less than a minute, at the price of greater variance in accuracy. --- # Disambiguation with external knowledge
??? Here’s an example of what I mean by greater variance in accuracy. The map on the left has prediction regions for 20 targets drawn on top of each other. They all belong to the same Autonomous System and the same /24, so it’s quite likely they’re all in the same location, but we get a pretty wide variety of shapes and sizes depending on exactly which landmarks were used for each measurement. But I can run that logic the other way around, too: I can say “these hosts all belong to the same AS and /24, so they’re all in the same location, and all of the prediction regions cover at least part of Canada but only some of them cross into the USA, so let’s disregard the possibility that they are in the USA.” This comes up a whole lot with data centers near national borders. We see this often for Canada, where most of the big cities are near the southern border, and for the many countries in Europe that are so small that you have to have a really tight prediction to not cross into any of their neighbors. And for city-states like Singapore. I actually called this the Singapore Problem in an early draft of the paper. A related move is to remember that we’re geolocating _servers_. Servers live in data centers. The University of Wisconsin maintains a list of all the data centers in the world, with locations. That lets us say that, for instance, all the data centers inside the prediction region on the right are in Chile, not Argentina, so we can disregard the possibility that this proxy is in Argentina. --- # Seven VPN providers
.caption[ VPN commercial landscape data collected by [VPN.com](https://www.vpn.com) ] ??? OK, that’s how I measured the locations of a bunch of VPN servers. Now, which VPN servers did I measure, and where did they turn out to be? I’m not going to give the names of the VPN provider companies, because there’s over 100 more companies I haven’t tested. I don’t want you to think the companies in this study are unusually misleading about their advertised server locations. On the contrary, I suspect this is an industry wide problem, and if we tested all of the companies we could find, we’d discover at least some falsehoods for most of them. But what I will tell you is that this slide shows 157 VPN providers, with the set of countries that they advertise servers in for each. The data comes from the comparison site VPN dot com. I’ve sorted this left to right by the total number of companies advertising servers in country X, and top to bottom by the total number of countries advertised by provider Y. So, provider A advertises the single greatest number of countries of any of these, B is second, and so on. A through E are all in the top 20 by number of countries advertised, and F and G are much more typical—think of them as the control group. I should mention that when I say “countries” I mean “ISO 3166 country codes.” This includes both sovereign states like Canada, the USA, Chile, and Argentina, but also territories like Guam and Christmas Island and Pitcairn. --- # Provider A
??? So now we come back to this slide I showed you at the beginning. Provider A claims to have VPN servers in all but seven of the world’s sovereign states. The left map shows the number of servers they claim to operate in each country—it’s actually a count of unique IP addresses, because many of their server domain names map to multiple addresses for load-balancing. We looked up all of the domain names in advance, from the same machine that would be used as the measurement client, and tested each address separately. So if they’re spreading load over more than one physical location, we’ll know. But, if DNS lookups give different results from different locations, or if anycast routing was used to put multiple locations behind the same IP address, we wouldn’t know about that. Incidentally, none of these providers offers IPv6 connectivity, and the measurement client was in a data center in Frankfurt, Germany. You can see a few of the countries where they don’t claim to have anything in gray on the left: Western Sahara, Mauritania, South Sudan. The others are all too small to see at this scale. And on the _right_, this is where we think everything really is. Almost nothing in South America, or Africa, or Central Asia, or Oceania. Fewer than advertised in several other places. And this isn’t just a matter of its being difficult to operate servers in certain locations. There would be no problem getting hosting in Norway, or New Zealand, or Egypt, or Argentina, but they don’t; conversely, getting hosting in China and Russia is a hassle, but they have. I can’t show it to you on this chart, but there is very little relationship between the claimed location and the actual location. Claimed locations from all over the world turn out to be concentrated into data centers in Florida, the UK, and the Czech Republic. --- # Provider B
??? Provider B isn’t making claims quite as grandiose as A, but there’s still quite a lot of lies, especially relating to South America, Africa, and Central Asia. And you might notice that they _didn’t_ claim to have anything in China but something is being attributed there. I don’t know that one for certain; the prediction region overlaps both China and the claimed country, South Korea. But the possibility should worry you, I mean, if you think you’re avoiding Chinese surveillance when you aren’t, that’s really bad. --- # Provider C
??? I’m going to go quickly through the rest of these, the overall patterns are much the same. C has some oddities, such as servers that they _advertised_ as being in the USA that we measure being in Saudi Arabia, Iran, and China, which is precisely backward from what you would expect, and I don’t know what’s going on there. --- # Provider D
??? D has a lot of IP addresses, but many of their servers are slow and overloaded, which increases the position uncertainty, and this is what that looks like: lots of spreading into neighbors. --- # Provider E
??? At first glance it doesn’t look like E is lying very much, but they are, it’s just hard to see on the scale of this map. Look closely at southeastern Europe. Servers advertised as in the Balkans, Turkey, and Italy are concentrated into Germany … and Russia. Similar things are going on in Southeast Asia. --- # Provider F
??? A hundred years ago people might have started a war over servers being in Germany when they had been told they were in France. Joking aside, again we have servers moving in the opposite direction from what we might naively expect: France to Algeria. That may actually be position uncertainty muddying things up. It’s hard to represent uncertainty on these maps. --- # Provider G
??? G doesn’t have all that many countries in the first place and mostly seems to be telling the truth, but look what happened to Iceland. --- # Summary
??? Here are two ways of looking at the actual results. On top, this is called an alluvial plot showing the mapping of alleged country to probable country for all the providers put together, and then below that, what degree of uncertainty we have. For everything to the right of the big purple bar, 900 out of 2500 server IP addresses, we can say with certainty that it’s _not_ where it was advertised to be. The red flows, at the far right, are cases where it wasn’t even on the same continent. Blue and purple reflect different degrees of uncertainty: we’re not sure which country it’s in but it _is_ on the advertised continent, and it could be on more than one continent. Provider claims are fully credible for a little less than half of the tested IP addresses, and _could_ be true for nearly two-thirds. But which countries account for the bulk of the credible claims? It’s the usual suspects: USA, Australia, UK, Netherlands, Germany, Canada, France, and so on. You can see that from the alluvia just by looking at which countries on the top row get the most green going into them. But on the bottom, I’ve visualized it a different way that’s hopefully easier to read. Remember a few slides back with the chart of the whole VPN provider ecosystem? This is just the rows for A through G of that chart, with the 20 countries most likely to be claimed expanded horizontally. The “honesty” score is what proportion of the IP addresses claimed for that country we found to be fully credible. So, again, it’s the usual suspects, but there’s some curious wrinkles, such as this bluish-purplish blob here—these are all countries where it wouldn’t be hard to get server hosting, and yet, they lie about them. And they tend _not_ to lie about servers in Russia. --- # What next? .twocolumns[ .column[ ## Improvements * More VPN providers * Continual monitoring * Greater accuracy * more landmarks * more client locations * more use of network metadata * iterative refinement * server-to-server RTT clusters ] .column[ ## Questions raised .tighten[ * Does this call research using VPNs into question? * What do people think they’re buying? * How easy is it to fake IP-to-location records? * Could these measurements be interfered with? * Should Web apps be able to measure precise network timing? ]]] ??? Obviously this is not the last word on VPN server locations, there are plenty of ways our results could be improved. We already have plans to expand to testing of more VPN providers, and do continual monitoring, and wrap that up in a consumer watchdog website—which _will_ name names. And we have several ideas for improving accuracy, starting with obvious things like adding landmarks and testing from different locations and making better use of network metadata, and moving on to fancier ideas like iterative refinement—keep adding landmarks to the measurement till the prediction gets small enough—and finding groups of servers with very short round-trip times among the group, which is a more principled reason for saying they must all be in the same location. I want to end, though, with questions raised by just the work so far. To know if this is a clear or a fuzzy case of false advertising, we need to understand what VPN customers think they’re buying; if it’s just access to Ruritanian streaming TV or if they expect their packets to truly get routed through Ruritania. How easy _is_ it to tamper with IP-to-location databases, the way we think they’re doing? Could the VPN providers prevent us from locating their servers by tampering with packet travel times or something? And finally, remember I said I’d built a Web application that runs an active geolocation measurement? That could be used by a malicious website to locate a human without their permission, which means maybe Web apps shouldn’t be allowed to measure precise network timings…