Tuesday, 17 May 2022
FRANZISKA LICHTBLAU: Hello everyone, may I ask the candidates for the Programme Committee to make their way to the stage so that we can start the session on time:
Apparently we are still missing ‑‑ exactly, PC candidates in front of the stage, so if you volunteered for the PC, please come to the stage and we'll make a small introduction round.
Okay, people, please find seats. It is four o'clock and we would like to start with this session.
I was promised I will get the PC elections slide.
Okay, so, I would suggest everyone who is volunteering for the PC, please go on stage just for a second. And I will pass around a microphone and would ask you to just do like two or three sentence pitch, why you want to work with the Programme Committee. So. Who are you and why do you want that?
SPEAKER: Hi, I'm Dave Knight, operator, architect on ultra DNS at Neustar security services, I have been participating in RIPE for over 20 years, I was a Chair of the DNS Working Group until last year, I have been previously been on the DNS OARC Programme Committee. I really love RIPE, and have missed not seeing everyone these past couple of years. And I hope you will give me the opportunity to make an already excellent programme even better. Thank you.
WOLFGANG TREMMEL: Hey everyone, I am Wolfgang from DE‑CIX academy. I am actually on the PC and running again and I have been on the PC now for two years, it was fun but this is actually the first physical RIPE meeting I am here as a PC member, and I like that so I would like to continue my work here on the PC for another two years.
SPEAKER: Hi, I am Max, I work at the Internet Society, I just want to be more active in the community and just I would like to have a more active role and that's why I applied for the Programme Committee.
SPEAKER: Hi. I am Magnus from German Edge Cloud, I want to be present on the Programme Committee to bring some young and fresh blood. I feel I'm ‑‑ I will be one the youngest ones.
SANDER STEFFANN: Hi. I am Sander Steffann. I had a long instant of eleven and a half years at Address Policy Chair, I was part of the first Programme Committee when we all started the thing up. I needed a break for a couple of years but I'm full of energy again, so I want to volunteer again to help make a nice programme for all you guys and ladies.
SPEAKER: Hi, my name is yet convergence he, I works an a research engineer Tor SIDN, I have been on the Programme Committee before and I remuneration loved it so I would just like to be there again. Thank you.
FRANZISKA LICHTBLAU: Thank you very much. The voting for the PC is now open, everyone with a RIPE NCC access account can vote. We have two open seats and a bunch of candidates, so we actually do have an election going on this time. So, please take the chance to participate.
And with that, we start with our first presenter, who hasn't a microphone, so I'm not sure how are we are going to do this. I will pass this one.
SANDER STEFFANN: So, this is a little presentation about how we started to do an Anycast project at 6Connect.
We just had this idea and ‑‑ so, we're not going to in too many technical details this is just a story about how we started it, how it went, the road blocks we ran into, and just to give you an idea of what's possible, what's not possible and how do deal with all this.
So, we started 6Connect as a global company, we had some DNS servers, but we're like our DNS platform should be properly global, and the best way to scale that is using Anycast. So, Jan and I decided let's just do it.
So, at that point, we told the technical ‑‑ the commercial people in the company that we were thinking about doing this, and they were like, hey, this might actually be something that we might offer as a service in the future. So, we had backing from the commercial side. We were doing a technical stuff, and we just started off.
But how do you do that? How do you actually deploy an Anycast network? Like, we have lots of experience with BGP and routing and the standard things. So, you know, figure out what is actually different when doing Anycast.
So, build a prototype, setup some measurements, do some fine tuning, check is everything is okay and that's it, right? Can't be that hard.
So, first choice was like, okay, what software are we actually going to use? So, we looked at BIND, Knot DNS, NSD, PowerDNS, and we're like okay, what are we going to use for our authoritative platform? And we just decided to use all of them.
So, we used DNSdist, we have all four of the DNS servers serving the same zones in the background. We have DNSdist in front of it, and that way we can also compare the performance of the different backends by looking at the statistics in the front end and the measured latencies and the policies and things like this.
So, for routing, we're just using BIRD 2. Ansible for rolling it out because if you are rolling out a lot of servers around the world, doing that by hand will definitely cause you to forget something, so, all of that is automated.
And of course Jan was involved, so we needed to include some bash scripting as well.
So, the way we set it up, DNSdist provides scripting and monitoring, cyst connect has tools, so we use that as our provisioning system for the zones. Sends it over to the server. We have a Python script that makes sure that all the backends are serving from the same zone file and we wrote a small script called Damocles which hangs over the BIRD daemon, and if it notices anything going wrong in DNS, it just stops the BIRD and that node will go out circulation.
A little bit more detail:
We started the first nodes. So, 6Connect already had some racks and equipment in different locations, so we started in Fremont, Ljubljana and Apeldoorn, these were our home towns. So, that's how we started.
Of course, if you want to do Anycast, you should actually get a proper spread around the world but, you know, for proof of concept we just started here.
But then we thought okay, how are we going to do this, because if you just have one IP address, then if that node fails, then all the latencies start massively shifting. Do we advertise multiple addresses on multiple nodes? What AS number do we use? Do we use a separate AS number for each instance or do we give each instance the same AS number? And in the end we decided we're using one AS number, three v4 prefixes and three v6 prefixes.
So, every POP now has three VMs. The first VM announces the first prefix as a primary address, and the second prefix as a backup address. The second node does the second and third address and the third node does the third and the first address.
So even if one of the nodes fails, the traffic still stays local and one of the other three nodes in the cluster catches the traffic.
So that way we actually also, for example, with maintenance, if we we boot one of the three nodes in the cluster, customers won't know just because one the other two nodes will just pick it up.
How do we do the failover? Well, if it's like I said the primary has a high priority and the secondary low priority. If we have a cluster, for example, with where we have our own BGP routing, we use iBGP with local pref, so that way we just have a clean failover.
But some locations we have VMs in a third party data centre, where we're not running the first hop router, so there we use eBGP with path prepending. That means if there is a failover, it is visible in the routing tables, but we tested it and it doesn't have a massive impact. So, good enough for now.
But then, things weren't properly working. We had some blackholes, we had some lost traffic, performance was really weird in some places. So, how do you debug this? Just putting some BGP boxes out there with DNS servers is easy, but what is going on? And why is traffic ending up on the other side of the world? Why do we have blackholes at someplaces in we just didn't know. So we looked at route views, we looked at RIPE Atlas, but it all wasn't detailed and targeted enough to actually debug this.
So, this was an issue.
Of course, even though we knew we had issues, we were like well we already know we're not having a good distribution around the world so while we are figuring out the measurements, let's add a couple of more nodes. So we added Tokyo at this point, and we just were brave and we have ‑‑ of course we didn't touch the 6Connect.dom domain name because I think ARIN would kill us if we did that. But we have 6 C labs, so we put that on our own cluster just to have a first test to actually see what's going on.
So, like I said we used 6Connect provision as our control centre. From there they are pushed to the servers. Basic monitoring is just Libre with SMTP and some plug‑ins. We do want to do some more measurements on the front end because a lot of these are just looking at the public facing part, but, yeah, if at some point one of the backends starts performing badly DNSdist wouldn't give it as many queries so on the front end you wouldn't notice. Of course as an administrator you do want to know that something is going on. There were some things here that could be improved.
And then we were like okay, while we're doing this Anycast thing, what else could we do? I mean, everybody always says, okay, UDP based protocols, you can do an Anycast, like, really short‑lived DNS queries, you can do over TCP, but we were like okay, how stable is the routing? What would be the effect if we would host other things on there? Can we host a database cluster on there? Can we do a web front end that is distributed around the world?
For example, spam filtering, SMTP sessions they don't live very long and if a session breaks, then the sender just tries again. So, what other things could we do because this is just an initial proof of concept, and, you know, we can just experiment with it. This is not something that we have to be super careful about because we have paying customers on it at this point.
So, for those things, we are also like okay not all services need to be in all locations. And how do you host it? Like if you add an extra service, we have of course per /v4 per /48 v6, we have one address for DNS. Could we host other things on there as well? But then you get into the problem that if one of the services fails because they are all in the same prefix, you would have to pull the whole prefix and also pull all the other services. Do we actually need to run everything everywhere? Can we put some load balancers and caches in the Anycast locations and run the actual service in another location?
So we have been experimenting a lot with this. I don't have any results on this yet. These are just the thoughts that came up in our heads that we have been playing around with. But, so far we're actually doing quite well. One of the bigger problems is usually trying, for example, to use Letsencrypt, because Letsencrypt you need to create a specific file that needs to be served on port 80 and if you are doing that for an Anycasted setup, which node will actually get the query? So you have to build something for that as well. So those are things we have been working on and playing with.
Actually this slide. Letsencrypt is a bit more complicated but with some caching and scripting and proxying it's actually doable.
But while we were having all these wild ideas, we are still having this problem that we don't really know what's going on. Where are the blackholes? Why is Asian traffic ending up in the US? Why is European traffic ending up in Asia? With what are the latency? What is the distances? Why are these things happening and how can we fix them?
So, this is ‑‑ this turned out to be a bigger problem than what we could handle. So, we had a visit with Remco van Mook, he lives 15 minutes away from my house, Jan was visiting me so we went to Remco for a drink and chat and we were chatting about this experiment we were doing and how we were playing around with this and it turns out that Remco actually just started a new company called Link State which specialises in measuring network reachability and Anycast specific stuff that exactly what we needed. And we also might have had some whiskey.
So, then we get to the point what do we actually want? We want to know where our prefixes are visible. We want to see from all over the world exactly which nodes, which clients end up at which server, we want to know what latency they have, what cluster they have. We want to be able to optimise that clients go to the cluster with the lowest latency from their point in the network. And of course, the blackholes, we had some places, I think it was Croatia, Serbia, somewhere around there ‑‑ Serbia ‑‑ where just a whole country couldn't reach or Anycast network and we had no idea why.
So, this is something that we actually worked with Remco a lot and he will be the next presenter, so he will go into all the details of this.
But, yeah, these are actual questions that you don't have when you run a simple network with, you know, just normal routing, normal BGP. If you do Anycast, you suddenly get a lot of questions that are kind of familiar to network engineers, but the implications are completely different.
So, this is where we really had to learn and where we really got some interesting results. You'd think that in one AS they have one routing policy, but we found ASes where the northern part of the network ended up in Europe and the southern part of the network ended up in Asia, and the network itself was in the US. And no idea how that network had different policies and you could see them geographically divided so there was something going on there, but you need tools for this. If you just want to deploy Anycast, like, if it's for a hobby project, sure, you just throw some nodes around the world and as long as nobody complains, it's good, right? But getting this right was actually a lot harder than we thought.
So, we ended up with a map a bit like this. In North America we had a decent reachability, we had, in Europe, we had the Netherlands and Slovenia, in Asia we had Japan, and then we saw this map, which is actually coming out of Remco's system I think. And we were like, yeah, this is a bit shaming, like we are definitely focusing on the northern hemisphere here, like the whole southern part of the globe is red.
So, at this point, we really started working with Remco and ‑‑ to basically just get the whole globe green and get good service everywhere.
This is actually my last slide. I want to jump a little bit ahead to after Remco's presentation, because at the moment we do have most of the globe green, and like I said, we had our own domain names running on this platform, eat your own dog food style, but we needed some more traffic to do some proper testing, so by now, like I said, most of the globe is green and we're running the dot QA domain on this Anycast network. So Dmitry is sitting here in the front of the room and this is actually now one of the ‑‑ unrunning one of the will TLDs, so how is that for a test case?
At this point, I would take some questions, and if there are no questions on the concept, I'll hand it over to Remco to show how to debug a mess like this.
FRANZISKA LICHTBLAU: We start with Peter, who is properly queued.
SPEAKER: Peter Hessler from DENIC. You mentioned in your presentation that you are using a sore of Damocles style scripting. How do you make sure you don't cause a cascading failure and kill all your nodes at the same time?
SANDER STEFFANN: I see you are not the only one with that question.
So, each node only queries itself. We have a whole set of records that it checks. Theoretically, if we push like a bad zone config to the Anycast Cloud, it could kill itself. But as long as as the zone pushing mechanism is correct, which we do actually check, we check and parse the zone files before we actually activate the config, then every node is independent and can't kill any other node. So we do keep the testing very local, specifically for this. We should probably implement some extra checks. Folks, if your zone file shrinks by more than 50%, stop.
So, at the moment as long as the zone file is correct it will keep running and for the dot QA stuff we are actually using IXFR updates, not even pushing the whole zone file. There is definitely room for improvement.
PETER HESSLER: I'll make a quick suggestion, some systems have seen issues where one node gets over powered and fails the health checks, gets pulled, the neighbouring nodes get over powered, fail their health checks, etc., etc. And then that's a really nasty cascading failure case. So I recommend add that go to your to do list.
SANDER STEFFANN: What we did for that, is we implemented red limiting in DNSdist, and but we white‑listed our own checks with the highest priority to make sure that they have the biggest chance to get through.
SPEAKER: Erik Bais. Sander, on the blackholes networks, where it wasn't reachable, did you try other prefixes for the BGP announcements or did you just use one?
SANDER STEFFANN: Three different prefixes, both v4 and v6, and it actually turned out to be a firewalling issue on one of our sites. So, it wasn't even a BGP issue.
ERIK BAIS: Interesting.
SPEAKER: On the topic of Letsencrypt, you were talking about HTTPS validation, what work, if any, did you put into looking into the DNS zero 1 based validation for certificates for them or other ACME CAs?
SANDER STEFFANN: So what we could have done the DNS except our own zone is provisioned before our own backend software ‑‑
SPEAKER: You could use C‑Name, could you C‑Name the ACME challenge out to another zone.
SANDER STEFFANN: Correct. But the biggest problem was the latency between requesting a certificate, updating the backend, waiting for the propagation back to the all the front ends. So, in the end, we just used the, I think it was NG X or AA proxy to just forward all the Letsencrypt traffic to one central node, request the certificates there and then push them out.
DIMITRY KOHMANYUK: Thanks to the work, just to comment I do like your statistics, you didn't mention that in the presentation. I say using ‑‑ I used many Anycast vendors and I did like the one you used.
SANDER STEFFANN: Thank you. I made those after making this presentation, plus the ‑‑ are you okay with me sharing some screenshots of that? Okay, so then I will make some screenshots and see if I can upload them so we can show them after Remco's presentation.
FRANZISKA LICHTBLAU: If you are really fast, we can do that.
SANDER STEFFANN: You know me, I can write presentations five minutes before the session starts.
FRANZISKA LICHTBLAU: Oh, I know. Next one.
SPEAKER: James. I have always been concerned about the potential for a rare corner case when doing TCP to Anycast deployment where Eyeball Networks would be doing per packet rather than per flow... sources and breaking company, are you able to determine that this never happens?
SANDER STEFFANN: I'll leave that one to Remco.
SPEAKER: And the response to the previous question is over my 12 years of using TCP with Anycast, never had such a problem. And to you now, Sander, networks of what particular size did you use, /24,/23 or what?
SANDER STEFFANN: So we have three /24s v4 space and three different SLS 48 in v6 space.
SPEAKER: All right, because /24s might be filtered by some networks. Anycast is fun, it's like a time in my life I realised that my Anycast failover actually failover, so yeah, thank you for your talk.
SANDER STEFFANN: So far the /24s do propagate quite well. So we didn't run into problems there.
FRANZISKA LICHTBLAU: Okay. Thank you.
REMCO VAN MOOK: Can everyone hear me okay? Yes. Apparently people in live stream don't raise your hands because you are not here.
Good. So, let's wait for Meetecho to do its magic.
For those about to do Anycast. Or as Jan and Sander came to visit for whiskey, dazed and confused is more to the point.
A little bit about why I'm standing here today.
I had a frustration, and it's always a good reason to do something. And that is, there is a lot of ways to measure stuff on the Internet, but not a lot of ways to measure from a lot of places on the Internet. And if you want to do stuff like network optimisation, or even apply machine learning to your network configuration, I mean, whatever kind of thing you think you might achieve out of that, you need data, you need a lot of data, you need diverse data. Currently there is more than 10 billion devices on the network used by 5 billion people, and if you want to get statistically relevant data, you need to have Internet skill viewpoint. And that means you are not looking for hundreds of viewpoints or thousands. Ideally, you want to be able to look and observe what's happening from hundreds of millions of places.
And, that's what we figured out. We found a way to take some of the existing frameworks from other companies and actually build a platform that we can leverage those and continuously measure network performance from a pool of hundreds of millions of end user connections. In fact the current tally sits at around 250 million.
And as it so happened, one of our internal test cases was global Anycast network. So, let's talk about Anycast. Like, that's what we're here for, right.
What are the key challenges with Anycast? Anycast is a bit of a different beast. First of all, you need to make sure that any inbound Anycast traffic shows up at all. How do you know? You don't.
There is no blackholes, no loops. And unlike Unicast, you just can't send out a packet and sort of expect it to show up where you think it shows up. You have to make sure that your inbound traffic ends up in the right place, and in order to make sure you define the right place, you need to have a defined intent. So what are you trying to achieve? Which POP is going to serve what geography or which specific ISP or whatever? So you need to have that defined, and then measured against that and see how you run with that.
So you can determine that by geography, service level objectives, how fast is it, what kind of latency. But one of the key things to keep in mind as you are optimising Anycast, is you are effectively trying to traffic engineer the reverse path. It's not about your outbound routes. It is about how everyone else in the world decides to send traffic to you.
So, the dream of Anycast!
You have a bunch of nodes around the world, and they all have a particular reach, and happy days! So, European traffic goes to Europe. Indian traffic goes around India, Australian traffic definitely stays in Australia and so on. But, here is the real world, and this might be a horror picture for some people.
This is, this is what a typical Anycast situation looks like. So, you have a massively out sized footprint for your European node because your European node is connected to a bunch of Internet exchanges that have excellent route servers which propagate your visibility all over the world, and doing a better job than actually most places in North America or South America. So, your European node will show up absolutely everywhere. At the same time, you are ending up with your Australian node which you are actually trying to limit because it's far away from everything, but for some odd reason, like half of Russia and the Caribbean are ending up in Australia, and if you ‑‑ I mean, I didn't put the sea cables into this map but you can probably imagine that this is the worst possible outcome. You see that your node on the west coast of the US has its own footprint, which is fine but actually also take over half of the traffic in South America and northeast Asia. So this is your problem to solve now, but you don't even know that this is your problem to solve because this is what your monitoring is telling you: I have a bunch of nodes and they are online, and they are getting traffic, how amazing is that?
So, Jan and Sander came over and they were talking about their Anycast issues, and I was like, oh, oh, so you mean something like this, right? So you want to know what kind of latency you are getting, you want to see if your network effectiveness is good enough, what's the axis distance that your traffic is travelling. Right. So how many extra kilometres is your traffic doing compared to an optimum situation? How much of it is actually matching the intent that you defined? And so on.
So which POPs do users end up in the last hour? This is a nice visual and it's typical good for nice visuals on big screens and not for engineering purposes but you can see where this is going.
Interesting other thing. This is a histogram of round‑trip time. To, anything over on the left‑hand side is good. Anything on the left side of this line, 50 milliseconds is good enough for pretty much anything you want to do. But, as you have a global network, this is data from my internal test case network, so here is this outlier at around 90 milliseconds for apparently Japan, and ‑‑ oh, no, that was Frankfurt. Here, this is my POP in Tokyo, and my POP in Tokyo is doing a lot of traffic that's clearly going to the other side of the world. That's not good.
So, other thing: How efficient is the routing that I'm observing? What am I seeing here? How many hops are the end users away from my Anycast POPs well again, this is all very good. I mean, four or five hops is like theoretic optimum because I have an edge node, their ISP is an edge node. One or two internal hops. This is all amazing, this is all great. This is okay. And this is just wild. This is crazy. Also, this is Comcast. Beats me. I have no answer. I have no idea. I am just observing.
So, am I hitting my performance objectives? How much of my Anycast traffic that I ‑‑ how many of my measurements actually work? It's a decent number. 100% of attempted measurements is a good goal to have. Anything over 99% of trying to do observation is okay.
My average round trip, is considering this is my test case gloriously unoptimised network, it's actually not too bad. My intent match, well I may have sort of changed my intent to reflect what I was seeing, which is a bit of cheating, but...
Excess distance, I know where all of my POPs are, I know with some level of certainty where the end users are because of GO IP lookups and that kind of thing. I can look at where is the end user? Where is the POP they ended up? What are all the other POPs? And what is the difference in distance between where did they end up and where should they have ended up or where could they have ended up?
And like, 300 kilometres, that's nothing, that's like a point something of a millisecond. That's fine. If this goes up to 2,000, then you have an optimisation issue.
More important for engineering, what's my homework? What is the stuff that I can see is going wrong?
So, here I have, for example, a whole bunch of ASNs in the US, and some ‑‑ they are ending up in Frankfurt and Amsterdam instead of in London, which is okay I guess. There is some interesting stuff. There is Rostelecom ending up in Tokyo, which is clearly not great. Comcast clearly loves to send all of its traffic to Miami. There is more POPs, but oh well!
And this is all nice, right. This is actionable. You can, as a network engineer, you can log in every morning, look at your dashboard and see how this is my homework for the day. This is the stuff I need to go and optimise. But, really, we're engineers, we're lazy, we like to get this stuff done by computers. So that's where the next step, where you want to take this. Because, knowing which networks should go where, you can help you by saying okay, so how do I do this? And some of it is the good old‑fashioned leg work of peering coordinators and sending e‑mails and getting on phone calls saying hey, you're network is behaving weird, can you fix it? Or, actually, we can help your customers get a better experience, right, that always works really well. If only you could do this.
So ‑‑ but in order to do this, you need, like I said, you need huge datasets in order to do this and continuous data feeds help, and this is what we're doing. And you can try to do all of this with like flow analysis on your network, but flow analysis really only tells you what's already happened, and if you are trying to optimise stuff, or you are trying to fix something in realtime, you don't have time to look at what's happened yesterday. That's it might be giving you some insight but you want to know what's happening right now.
So, ‑‑ Anycast, as a technology, as a network technology, is a little bit like Formula 1. To get the settings for your Anycast network exactly right, it is absolutely incredible, it is by far the most optimised piece of network technology you can run to deliver content. The problem is if you don't get it exactly right, things get pretty mediocre pretty quickly, and the environment is the defining factor. It's not about what you're doing, it's what everyone else on the Internet is doing to your prefixes, and let me tell you, they are up to no good. It's a bad world out there. And the Openflow way you can actually see what's going on is with specialist tools.
So with that, any questions?
FRANZISKA LICHTBLAU: Thank you. So, I will start off with questions that we actually got in written form. So bear with me.
Aaron Weintraub from a small local ISP is asking: How much of this sort of problem is additionally self‑inflicted damage caused by some entities buying transit far out of market due to real or imagined pricing concerns? So South American buying in Miami, Asia buying in Los Angeles and so on.
REMCO VAN MOOK: That's actually an are interesting question. Yes, to an extent, but also one of the things that we have observed with one of our early stage customers for this, is what's actually even more important is the mix of who are you buying transit from as an Anycast provider and who are your end users, the end users that you are trying to serve, who is their ISP buying transit from? So interesting example, it's one the things we wanted to optimise was a number of people playing some games, I'm not going to name any names, in Greece, and 90% of the traffic from Greece was ending up in Athens, amazing, that's exactly where it should be, and 10% ended up in Madrid and I mean if you look at sort of the network topology of Europe, Madrid is probably the worst place for traffic to end up if you are starting from Greece. And why was that? Because of the transit provider, because the transit provider for that particular POP in Madrid was a one‑off, was a different one that they were using elsewhere.
But it happened also to be the transit provider of one of the major ISPs in Greece, and that route was performed over everything else and there you go
RUDIGER VOLK: (Still retired) my ‑‑ during my operator's days, I remember some day when I forgot oh, Anycast users were having trouble and, well, okay, I was thinking about it back and forth, and I figured out well strange thing they are basing their optimisation of traffic on the routing system, which is probably driven by people with different objectives and circumstance, and optimising their system, and well, I figured out one should not be ‑‑ one should absolutely not be surprised that this can spell trouble, and certainly is kind of not a safe way of doing optimisation, and the question that Franziska read is kind of an extreme example of that where even optimisation criteria kicked in that were far outside my network operator's optimisations.
REMCO VAN MOOK: And that is exactly why you can't ‑‑ if you are trying to do Anycast. I mean no one says Formula 1 cars are particularly safe, right, they are really fast though, so, it's really cool to have them. It's maybe not the same thing but you want it to be as safe as possible. And in order to get it as good as you can, you need the data to tell you what the whole world is seeing, and I completely agree with you, you are trying to ‑‑ like, actually it's in my second slide, you are trying to optimise the reverse path. You are trying to reverse ‑‑ you are trying to reverse someone else's routing
RUDIGER VOLK: Yes. And actually one of the most frightening things is you are really on shaky grounds because the optimisation and the objectives of the optimisation on the routing level may change any moment without, and you cannot even blame the guys for not making an advance notice.
REMCO VAN MOOK: That's why you need to realtime data.
FRANZISKA LICHTBLAU: Okay. Apparently people have found the Q&A feature that makes me read out a lot of stuff. So you need to wait a sec.
First one, by Maximilian. Are you analysing the respective tier 1 upstreams of your ISP? In general it is key to have the same tier 1 upstreams everywhere for available. For BGP communities like do not announce to internal peers are very valuable resource?
REMCO VAN MOOK: That is all true. It's also not enough.
FRANZISKA LICHTBLAU: Okay. Let's take another one of those.
SPEAKER: Nominet. Is the tool you demonstrated a commercial product Open Source or something else? It would be really interesting in getting data for our TLD from it.
REMCO VAN MOOK: I would love to talk.
FRANZISKA LICHTBLAU: You know how to reach out to.
REMCO VAN MOOK: And for TLDs, I mean Dimytry is actually ‑‑ is using 6Connect who are using this data to optimise dot U A, so...
SPEAKER: Tom Strickx, CloudFlare. Really cool talk and data, data visualisation, but is it my understanding that you are currently selling this or are you providing this as a service to Anycast operators? What about offering that data to Anycast receivers, like service providers because like you said, one the issues that you are trying to debug is the reverse pathing and providers like CloudFlare for example we attach location based communities to all of our Anycast advertisements to make it easier for us to figure out where things are. We're not at a stage yet where we are publishing that globally but that is definitely a state we want to be in. Is that something you are considering as well with your datasets?
REMCO VAN MOOK: We're definitely heading in that direction, we were just really only starting to take the covers off. We are just coming ‑‑ we are just launching, starting to build this and go a little bit public about t Jan and Sander were visiting, this is still very much an under development, not to be disclosed kind of thing, and I was like, this is, this is what ‑‑ this is what you are need. Definitely, and the use case for this is not uniquely Anycast obviously, because you can do basically ISPs interested in what kind of performance they are giving to Cloud providers, Cloud providers is interested in get what sort of performance to ISPs. You can basically name a business case.
FRANZISKA LICHTBLAU: Another question from the Internet ‑‑ you people have weird affiliations ‑‑ "How much percentage of the Internet content is really needed at external POPs? Isn't it only needed for global operations and therefore a minority of the total online content?"
REMCO VAN MOOK: I'm not sure I understand the question, and secondly, I'm not sure if I'm qualified to answer it.
FRANZISKA LICHTBLAU: Then rather abstain. So...
SPEAKER: As analysing the latency performance of an Anycast network from a client side, some of the outliers on the graphs could be explained by VPN servers, especially popular ones. So that's something maybe to look at. I did that before.
REMCO VAN MOOK: Yeah, I am keenly aware. So, I mean, so this ‑‑ and this is a point about having a sufficiently large dataset, because when you have a sufficiently large dataset, I mean the VPNs, they stick out like sore thumbs, you can pick them up because you have a graph that goes like this and then there is this peak, that's a VPN provider. You don't know that until you have enough data.
SPEAKER: Correct. Secondly, I totally support and endorse the idea of designing Anycast networks with whiskey. Thank you.
REMCO VAN MOOK: I agree.
FRANZISKA LICHTBLAU: Okay. I think that was it for this presentation. So, thank you.
REMCO VAN MOOK: You're welcome.
SANDER STEFFANN: So, as Dimytry asked ‑‑ we put some stuff in the Grafana. The way we did it was we used DNS tap to get the queries out. Then we used a collector that presents that as properly eatious metrics which we then graph in Grafana.
So, the colour is a bit bad on this one. There is a purple line at the top Swisscom dot UA, so this is the first level basically, that U A is split into several sub‑zones and we made a list of all the queries. Now if you look at the bottom we're getting C N, we're getting DE, we're getting net. We're not running TTL authoritative DNS for any of those. So, there is a whole bunch of weird DNS traffic ending up at our Anycast cluster but you see at the top the come dot UA gets three and a half thousand queries per second at the moment. This is roughly ‑‑ this data I just screenshoted five minutes ago, so this is as good as live.
Then we dive in one level deeper, so, we not only take the dot U A zones with you we actually have graphs and you can see all the different sub‑zones and what ‑‑ like this is a top 50 I think or top 25, so ‑‑ and it's interesting to see some zones like the purple one that is at the bottom, then suddenly it goes all the way up to the green lines and then drops away again. So there is suddenly a zone that is very briefly very active and then drops away. So, if you want some interesting statistics, you can do quite some nice research based on this.
So this is all the zone data. Then we also have some more generic cluster wide information type of query, response code that we send out. We send out the latency, so this is the latency between the query coming in and us sending the response out. So we monitor our own latency there. And at the bottom, the graph is actually showing the different Anycast sites we have. APE is Apeldoorn, it's my Openflow down in the Netherlands and of course that is quite close to Ukraine, it's not surprising that that one actually gets most of the traffic for this.
Then we have the cluster of three, so NS1 is the first prefix, NS2 is the second prefix and so on, so you can see that actually for some reason, NS2 is getting slightly more traffic than the rest. So, yeah, it's interesting that not every prefix gets exactly the same amount of queries. You'd think with 2, 3,000 requests per second it would equalize a bit more, but this is what we're currently seeing.
DIMITRY KOHMANYUK: It gets less traffic because I removed v6 addresses because we had connectivity issues, I'll fix it later.
SANDER STEFFANN: I forgot to change the title on this slide. This is actually the ASNs that are sending the queries. So, we're getting a lot of queries coming from Google and then there is some different ones that have a very interval based pattern. No idea what they are. But, you know, if somebody feels like this is interesting research subject, ask me or Jan or Dimytry and we can see if we can give you some data.
And that was it. Thanks.
JAN ZORZ: You still want to ask the question after the two minute slot. I would like to apologise to Dimytry and I share a deployment story. What you saw here in these graphs, this tool, we developed it and he was continuously saying I can't reach it, it times out and we were testing it. And it worked for all our locations and then it turns out it doesn't work over IPv4. We didn't even notice it. Sorry. We will fix it.
DIMITRY KOHMANYUK: I was in Greece and I got a residential v6 and that was magical working, so all hail Greece for finally doing something better than many other countries, including Ukraine, we have very little v6 deployment, but please, Pavlos Sermpezis, please come up.
PAVLOS SERMPEZIS: Bias and Internet measurement infrastructure, it's uploaded, I checked it but ‑‑ because we added one more lightning talk which was not on the programme as of yesterday, and so we get kind of the one‑off error. But we have got time.
PAVLOS SERMPEZIS: Hello everybody. I am a researcher and in data and website lab in Greece. And today, I am going to present you a work that it's a joint work with Lars Prehn from the Max Planck Institute and Emile Aben from the RIPE NCC. In fact it's based on a recent article I have written on RIPE Labs and the committee there liked it and invited me to present it in front of you. They thought that it would be interesting for you.
So, the topic it's bias in Internet measurement infrastructure. And while I was thinking how to introduce it to you, I saw yesterday a talk by Doug Madory from Kentik that the topic was about how to measure RPKI ROA adoption based from some data they have from the networks of their clients. Presenting his methodology he made an important point that it's note there, that the analysis I presented is subject to biases of the customer set of, which includes most lease networks from the US.
And it's very good for him that he did this note, because it's important. When we present data, when we see our own data from the measurements and we want to understand what's going on and get some insight, it's really important to understand if there is any problem in this data, if there is any bias, because this may affect our conclusions.
And that's exactly what we are investigating in this work. So we're investigating the bias like the measurement platform like RIPE Atlas and RIPE RIS and right RIS, the bias that exists in these platforms that a lot of people use for their everyday issues in their networks.
So, what these platforms allow us to do, it's to view the Internet. So let's say that this is the Internet that is composed from many networks of different types all over the world, though they are connected with each other. The measurement platforms give us a window, a window to see what's going on in the Internet. So, network engineers, they know what's going on on their Internet, and if they want to learn what's going on on the outer Internet, and they want to know what's going on in their network, and if they want to learn what's going on in the outer Internet, they look through the measurements and the measurements platforms, so they are like a window.
But, the problem is that the window it's not like this. It's mostly like this. So, it's a stained glass window. And why we make this analogy is because some parts behind this glass cannot be seen. We don't have visibility everywhere. Some parts can be seen quite clearly, it's behind the transparent blocks where you can see some networks, and some other networks can be seen but not that clearly. It's the networks behind the colours.
And this is exactly how we define bias in a measuring platforms and measurement date. So bias exists when we don't have the same ‑‑ when we cannot see all network types equally clear, so when we see some networks more clearly than others, some network types.
And let me give you a couple of examples to make this clearer. For example, RIPE Atlas and the RIPE RIS have many probes and peers and a lot of them are in Europe. This is normal given that the RIPE NCC Services this region. However, this is a location bias. We may have a clearer view of networks in Europe rather than in other locations.
But, a location is not the only type of bias we may have. We may have network type bias. So, what this plot shows is is it's a different network types, as we are declared in PeeringDB and the bars, so what is the percentage of networks that belong to each network type.
So, if we focus on the first category of the cable DSL and the ISP networks, and you see the blue bar, the blue bar is the percentage much when all the networks that are of this type.
Now, if we take into account the networks that are on RIPE RIS and route views, which are the green and red bars, we can see that these bars are lower than what happens on average than the blue bar. So what does this tell us? This tells us that cable, DSL and the ISP networks are under presented in measurement platforms.
On the contrary, NSP in the category of NSP, we can see that the green and red bars are over represented because they are higher than the blue bar. And this is a clear network type bias we have in the platforms.
Another example, a third example is the topological bias, so, what this bar shows here is the average number of neighbours an ASN has. So if we consider all networks, all ASNs in the world, the average number of neighbours, it's quite low, it's the blue bar, it's barely seen.
But if we consider only the networks that are in RIS or route views, we can see that we have a lot of neighbours. And this is a huge topological bias we have.
So threes are only three examples, and the bias in fact has a lot of dimensions. These can be location, network size, network type, IXP connectivity etc. So what we did is we tried to analyse the biases in some important dimensions. So, we collected public data for all of these dimensions, and tried to calculate the bias core in every dimension.
So, we calculate the bias core from zero, which means flow bias to 1, which means high bias and we present all this analysis in this radar plot which I will explain to you how you can read it.
So it's radius of this radar plot corresponds to the dimension. So you can see dimension is related to location, network size, topology, IXP connectivity, etc. And the coloured lines and areas correspond to different infrastructures.
So blue is for RIPE Atlas, orange is for RIS and the green is for route views.
The value of the line corresponds to the bias core. So for example, if we can see at the location here, the bias is 0.2. If we see at the dimension of topology here, the bias of RIPE RIS is higher because it's further from the centre. The further from the centre, the more the bias.
And it's around the 0.4. So, what does this analysis tell us? What we can see here in this plot is that at first, all these three platforms are biased. And they are biased in most of the dimensions we took into account.
Here, it's important to note that these are some dimensions that we considered important and there are public data for them. But in this plot you can add more dimensions and do more analysis or remove some dimensions if you don't care about some dimensions.
So, another thing is that the bias is not homogenous over all dimensions, there are some dimensions where we have a lot of bias and some dimensions where we have less bias.
So, for example, some let's say main observations is that RIPE Atlas is biased but it's less biased than the route collector projects for RIS and for route views.
Another observation is that RIS has quite large bias on topology features, which can be explained because route collectors software are connected to IXPs, where networks peer with many other networks. And it has also quite a large bias in terms of network size because the majority of the peers are large networks. The peers of RIS are large networks.
And finally, we can see that all platforms have relatively low bias on the network type dimensions. So, in terms of how representative they're in terms of network types. There is bias also there, but it's lower compared to other biases.
And this analysis can be done also in other measurement platforms or even for specific measurement purposes. So, for example, we analysed what is the bias of the peers of RIS and route views that are full feeds that provide more data. And the bias is higher. It's the red area where you can see that it's larger than when we consider all the collectors.
And another example it's in the RIPE Atlas platform where we compared the bias between the IPv6 probes and IPv4 probes, IPv6 probes were a little more buysed. However, the difference is not that big.
So, why it's important to know all of these things and to see this plot and maybe pay some attention to these:
It's because the bias, as I told you before, affects our insights we can get out of the measurements. So we believe it's important for someone who works with these platforms, or wants to take decisions by the measurement data he sees, to take into account that there is bias, and maybe handle it, depending on his use case, or her use case, because there are different use cases we cannot have a generic rule for all of them. So some things someone can think before trying to interpret the results, it's at first which dimensions of bias would affect my measurements? So measurements are about latency, some others are about routing paths etc. Not all dimensions may be relevant to his measurement objective.
And then, they could go to our plot and see if there is bias there. And in fact in our website, we provide more detailed data about why this bias exists, for example, as I told you before, it's on NSPs, it's on CDNs, etc.
And in fact we did a short survey and we found out that not all people are aware about the bias. There are some experts that know about bias, so our main goal with this presentation and the article we have written, it's about to raise awareness between the community. So to be careful when you interpret your results, and then even to deepen our understanding even for expert users.
So there are also some next steps we can do about the bias. We can reduce the bias and there are two approaches to this. At first it's from the existing infrastructure we have, to select what data to use, for example, select probes or select data from RIPE RIS in order to use the bias in the data we have. And the second option is to extend the infrastructure but in a way that is bias free.
And I'm not going to present you any of these results here, but stay tuned to our project, we are going, soon, to publish some results on this.
So, that was all from me. And what I would like from you is to give us feedback, because that's something that the research we do for the community, for everyone. So to give us feedback in order to improve our analysis or maybe build a tool for you, and in fact, tell us if you like it or if you don't like it, if you don't like it, it would be more important feedback for us.
And also apart from the questions we'll be around with Emile, so yeah, you can chat with us. Thank you.
FRANZISKA LICHTBLAU: Thank you very much. I would actually like to thank you too, because I was working in that research world as well and I was always sure this phenomenon exists I was just too lazy to investigate because it is a lot of leg work so great job you did there.
SPEAKER: Do you want to know how deep is the rabbit hole really? I have seen it down somewhere in IR IETF proceedings probably that the PeeringDB network type itself it's significantly biased towards ISPs away from enterprise networks. So, I mean it's a lot of collateral there.
PAVLOS SERMPEZIS: That's true. The bias I presented to you some of them is based on the PeeringDB dataset that it's self. Etc. The best we can do. We don't have more data to analyse, but there are some other dimensions that are not based on PeeringDB so we have information for all networks. But this is a valid limitation but if you have any idea on how we could document the data in PeeringDB, that would help us.
SPEAKER: Thank you.
AUDIENCE SPEAKER: Individual network operator. On the first slide I think you had a bar chart where all AS bar was higher ‑‑ lower than the RIPE Atlas or RIS bar. Yes.
So, I don't personal understand why all AS is lower than RIPE Atlas, the orange.
PAVLOS SERMPEZIS: It's not all. So in fact these percentages count up to 100% for its colour. So, if you sum up all blue bars they are 100%, so they cannot always be high upper. Did I miss something in your question?
AUDIENCE SPEAKER: Maybe I just have problems to understand that. But ‑‑ if it makes sense, then that's okay.
FRANZISKA LICHTBLAU: Let's take this offline.
AUDIENCE SPEAKER: Hi. Ivan Beveridge from IG. I was wondering if you'd analysis on any of the commercial kind of equivalent systems that perhaps ‑‑
PAVLOS SERMPEZIS: You mean commercial platforms? No, we did not do.
AUDIENCE SPEAKER: Is that something you are planning on doing or...?
PAVLOS SERMPEZIS: It depends if we have data we can do. What I mean by data is, we just need to know what parts they measure, where are the networks they measure from? Just give me a list of networks for a measurement platform. Is it to the RIPE Atlas, is it any other platform? And I can produce you this night plot. I just need an input of networks.
AUDIENCE SPEAKER: That's very interesting. Thank you very much.
SPEAKER: James Rice. Apologies if I missed an explanation somewhere along there. I didn't quite catch what I were saying to be unbiased as it's equal weighting or even distribution per prefix or per IP address or autonomous system number or network operator, what would you take defined as being unbiased here?
PAVLOS SERMPEZIS: At first our analysis has been done on AS level. So we did not consider more fine grained analysis. That's because there are the public ‑‑ most of the public data was based on an AS level. So, in order to do it in a more fine‑grained level, you need to do some extra measurements to collect the data by yourself. It's doable. It's accessible, you just do exactly the same thing but you just need a different data.
And that's one of the things that we are planning to do.
Now, what does it mean to be unbiased? So for example, here in this plot, the optimal scenario, we would like the green bars, or the orange bars or the red bars, to be equal as the blue bars. So, for example, let's say that in the world, there is 10% of large networks and 90% of small networks, we would like the peers of RIPE RIS to be 10% large, and 90% small. I understand that this is maybe not what RIPE RIS would like, because they connect to large networks, because they get more information from them.
So bias, it's not the single objective we may have when we build a platform and probably that's why the platforms are biased but it's something we could take into account and there are a nice trade‑offs that you can both have a lot of data and satisfy your other objectives, and at the same time, be unbiased.
AUDIENCE SPEAKER: Okay. Thanks.
FRANZISKA LICHTBLAU: Okay. Thank you, Pavlos.
DIMITRY KOHMANYUK: We now call our last presenter for today, that's actually last moment submission, Aaron Glenn. We have got a bit of time. I think it's 15 minutes. I'll start it now.
AARON GLENN: It's really quick. I didn't realise lightning talks were this long.
My name is Aaron Glenn, I am someone else in this slide. If you are not familiar I have been talking about programmable data planes and the language that allows you to do that, or at least one of the most popular ones. It's called P4, I have been talking about it for many years and one of the biggest problems is the developer experience, people want to go from okay, let me try to programme something to actually programming it, not spending three hours downloading a vagrant VM.
So, in order to fix this ‑‑ you can open up your browser, you can go to the vanity domain C dot P, the number 4 dot works. You'll be presented with a little text box. This is to keep the general Internet out because these are just little containers that last for three days. No login. No really saving state. The cookie will let you come back to the browser tab in like 24 hours, GDPR scares me, I don't want to know anything about it. It's intended to be a quick thing over coffee. I really do mean that. If you spend a little bit of time and just run the Python scripts and the P4 app thing you can experience a learning Layer2 switch in like 15 minutes, and my hope is that it kind of whets your whistle, so to speak, so you are actually a bit more interested in and play around with it if you do get bit by the bug, I am happy to share the Docker file with you. I'm not just putting it out there because I want to talk people who are interested, not lucky loose. If you are interested, e‑mail me. I have another vanity domain here.
Last bit: I'm trying to set up a situation where you might be able to get some actual IP address resources Anycasted on the Internet and run your own P4 programmes attached to them. If you are able to do the exercises and make your own working Layer3 IP router, I can give you a /32 that's Anycasted, I can give you a /44 that's also Anycasts and you can run your own stuff on the Internet. Set up anything that Linux supports, write your own P4 programme to handle the encapsulation, obviously it has to go over the general Internet so no SR v6. Anyway it's not a long one. Everything is copied from the wonderful world of the Dr. Lauren van Beverach and his group. They have a wonderful GIT repo called P4 learning, it starts with the hello world example of reflecting a MAC address back at you. Example 4 is an MPLS label switch, so it goes pretty quickly and you can start doing some real stuff with it but the surprising part it me when I got into this a few years ago is it's not that many more lines of code. Layer2 ethernet learning switch doesn't have that many more lines of code to get MPLS and a deep label stack.
I'm sorry I don't have any slides on P4. Again this was ‑‑ I figured this would be like a five‑minute thing. So, if you are interested in P4, there are a lot of great resources on the Internet. I gave two talks in the before times at RIPE, they are probably a little dated but I should at least point you in the right direction.
Please try it out. Please reach out to me. Come find me, I am here all ‑‑ till Friday, and I look forward to interacting with any of you interested. Thank you.
FRANZISKA LICHTBLAU: Thank you, Aaron. But you have to stay on stage for a bit because we actually have a comment ‑‑
AARON GLENN: I didn't say what to put in the text box. All right, in the text box it is the future, capitalised T, capitalised F, there is a space, that will open it up for you. Don't tweet that. I don't need this thing on the general Internet and a bunch of people screwing it up. Thank you.
FRANZISKA LICHTBLAU: Good point but that was not the thing.
Well done. I mean, this was actually a real lightning talk for a change. So we have Kurt Kayser in his private capacity and he comments: "I consider P4 a very smart idea to decouple software stacks from hardware vendors. I really would like to see this used more in government networks that should be locked into fixed hardware software cycles from volatile vendors. Please consider it seriously."
AARON GLENN: Call Ed at Intel. I told him that was hard and you should tell him it's also hard. Yeah, it's thing that's considered. Please great, e‑mail me. Love it!
FRANZISKA LICHTBLAU: Okay. Cool. Thank you.
And with that, we give you a couple of minutes back for your next coffee break, and after the coffee break in this room, you will find a BCOP task force that talks about best currently operational practices and in the side room is the meeting of the Code of Conduct recruitment team and after that we will have a great social event, two days of Working Groups and the Plenary will be back for you on Friday. And please vote for the PC.
LIVE CAPTIONING BY
MARY McKEON, RMR, CRR, CBC