DNS Working Group
19 May 2022
11 a.m.
CHAIR: Good morning, we're going to start in one minute, so I'd kindly ask everyone to take a seat.
Since it's already that quiet, we can just start. Hi and welcome to the DNS Working Group session, my name is Moritz, this is my first meeting as a DNS Working Group co‑chair.
(Applause)
Thank you and thanks to everyone who supported my submission. I work for SIDN, these are the registration for ccTLD and I do all sorts of research on DNS and DNSSEC and I am also a part time researcher at the university of Trenten.
Just a brief look at the agenda
We have two remote presentations today from Sara and one presentation from ICANN, the other people are hopefully in the room, and I think with that, we can already take over to Sara for the first presentation on DNS over QUIC.
SARA DICKINSON: So, hopefully you can see my slides and hear me?
MORITZ MULLER: We can see the files on Meetecho, but not on the screen. Yes, we can see your slides.
SARA DICKINSON: Thank you very much. Today I will be talking on DNS over QUIC, or DoQ, as it tends to be called. I had to update my slides because this became an RFC literally last week, RFC 9250, and I am a co‑author of that RFC along with Christian and Alison.
So, the topics I'm going to try and cover today are why are we standardising yet another proposal for encrypted DNS when we already have two? And to do that I'll try and describe how it works and how it's different to DoQ and DoH and also talk about where we are in terms of implementation and deployment of with DoQ. Firstly a little bit of background on the the QUIC proposal itself.
It was originally developed by Google in‑house starting in 2012. They spent a few years developing it and decided that it was mature enough to move to the IETF to be standardised as a proposal, they did that in 2015 and at that point the hope it was would only take one or two years to standardise that. In true IETF fashion it took six years and so QUIC was eventually standardised last year in RFC 900.
Now, it is implemented and used by browsers and CDNs, and the best that I could come up with was that today, roughly 8% of websites are using QUIC.
The key characteristics of QUIC as a transport proposal are that it runs over UDP and it is always secured with a TLS 1.3 handshake, so there is no unencrypted version of QUIC. It is an internally encrypted protocol.
It has reduced latency in terms of its handshake and it is relatively easy to achieve zero RTT using QUIC. Importantly, QUIC is a multiplexing transport protocol, it uses multiple screens within a connection to achieve this and the key thing that gives you over using TCP is that QUIC suffers from no header line blocking, that's very important.
In another gain over TCP it has improved Aero detection and loss recovery and this makes the QUIC connections for reliable and more robust on the the heavy traffic ‑‑ RTT packet loss.
QUIC also has a neat feature called connection migration, which means that two end points that have a QUIC connection between them can maintain that connection even as the end points change IP addresses and that can be very helpful when you have a mobile client for example. Google developed QUIC to be the transport protocol on top of which they could run HTTP 3 to give them enhanced performance for web traffic.
However, it became clear very quickly when you think about the characteristics of the QUIC protocol that it's an incredibly good fit to do encrypted DNS. It has low latency that's important in DNS, and you get all the benefits inherent in QUIC protocol but you also have source address validation built into the QUIC protocol which is very importance in DNS. And also with QUIC, the PathMTU does not limit the size of the messages that you can send over UDP.
So, we started work on it quite a long time being but unfortunately its standardisations was blocked by the standardisation of QUIC itself which took until last year.
Now, just to reiterate the point about the similarities and differences between different protocols. On the left here, you have the protocols over TLS, so that's DoT and DoH, and DoH is where DNS messages are encoded within HTTP/2 running on top of TLS. On the right, you have DNS over QUIC, which is analogous to DoT. And what's also possible and is a slight source of confusion sometimes, is that we already have the specifications to run what's normally called DoH 3 where youen code DNS messages in HTTP 3 running over QUIC. I am seeing messages that my audio is broken.
MORITZ MULLER: We're not sure if it's your side or other side.
SARA DICKINSON: I am trying to reload the audio here.
MORITZ MULLER: Could you stop and restart the audio?
SARA DICKINSON: Let's see if that's any better?
MORITZ MULLER: Unfortunately not.
SARA DICKINSON: Is that any better?
MORITZ MULLER: Sounds better, yes, thank you.
SARA DICKINSON: Let's give that a try. Okay.
Right. So, on this slide I'm just comparing the various protocols and what we're going to focus on is DoQ running over QUIC.
So, the background to the development of DoQ. We originally wrote the first draft back in 2017 for this, and based on that, add guard actually created a stub to recursive resolvers service in 2018 using that specific specification and have been running it ever since. However it took until 2020 for the gaffe to be adopted in the deprive Working Group which is the Working Group concentrating on improving privacy in DNS. After working on the draft for a while, the following year, the 03 version of the draft contained several big changes. First of all it rescoped it so that it wasn't just up to recursive only as DoT and co‑currently are, but it also covered zone transfer and recursive 2 authoritative as well.
The mapping was updated slightly, and we made a port request for port 853. I'll touch on both of those topics in a later slide.
After those updates, we move into the last call at the end of last year, and then just this month it was approved for publication as an RFC.
So, first a quick look to go back to thinking about what the DoQ handshake looks like. This is an image blatantly copied from a very good blog by CloudFlare who have several blogs about DoQ, so I recommend reading them. And as you can see on the left, when using TLS over TCP you have to set up the TCP connection first and then do the handshake for TLS. Whereas the handshake in QUIC the connection and handshake are combined in the initial exchanges so it's much quicker.
DoQ also has its own ALPN, as does DoT and this is one way to distinguish them.
Looking in more detail at what the DoQ connection itself looks like. So, inside a QUIC connection there are streams, there are different types of streams, the kind that DoQ uses is a bio direction client initiated dream, and those have IDs, which are multiples of 4, so 4, 8, 12 upwards. The mapping for DoQ uses a single stream for a single DNS query response trust anchors. Then that stream is closed.
Now, streams are readily available inside a QUIC connection. There are two to the 64 stream additions available so you can send on lots of messages inside a single QUIC connection. And in the mapping to avoid any conflict between the DNS message ID and stream ID, we always set the DNS message idea to zero for messages when this connection and that's just something that has to be kept in mind if you want to (something) the dock.
I mentioned that the mapping changed slightly. In the original mapping, the encoding of directly for DNS messages into the stream. However, that presents a limitation, you can only send a single response on stream. So in the 03 version of the draft, it was updated so that we now prepend the DNS message with the two octet length field just like we do in TCP and this provides the flexibility so that servers can then serve multiple responders within a stream and then we can support zone transfer where a single zone transfer takes place within a single stream.
.
As I mentioned in the other ups in the 03 is we specifically defined DoQ as a general purpose protocol for DNS encryption, and the draft described three scenarios, the stub recursive scenario, which they are already running services for and which claim good performance, particularly in mobile networks, which is something you would expect with DoQ.
Specifically, the recursive 2 authoritative scenario and it's thought that DoQ is more attractive here than other protocols because it should certainly be more performance than DoQ and it's lighter way than any of the DoH options.
And we also specifically define it can be used for zone transfers. Last year, RFC 9103 was published specifying how to do zone transfers over TLS and using DoQ zone transfer is a direct extension of that specification.
There was quite a bit of back and forth over what port to use for DoQ. We originally suggested a port 784, because back in 2016, when DNS over TTLS was being developed, port 853 was assigned to DNS over DTLS. However, it was recognised that it remained an experimental specification and to my knowledge is has no implementation or deployment five years after specification ‑‑
MORITZ MULLER: Sorry, your sound seems to be gone south again.
SARA DICKINSON: I have reset audio again. Hopefully that's better.
And however, after some discussion over portalcations, it was finally agreed that QUIC could also use port 853. And one of the reasons this is possible is that QUIC Version 1 specifically designed to be able to do something with DTLS, there is a magic bit in the header which let's you determine what protocol it is.
Looking at the existing implementations for DoQ. I have listed the open source ones here. Could you DOS to add guard they open sourced all code that they used for their resolver service. On the server side, they used an extension of core DNS written go. On the client side they have got some C++ libraries, there are another couple of libraries, one in C and one in Python if you want to experiment, and there is an experimental trial of DoQ in flame thrower which is a DNS performance utility.
To my knowledge there are no implementations in any of the major open source recursive resolvers or authoritatives but I am hearing very encouraging statements from some of the big operators about wanting to do trials and wanting to move forward using DoQ.
And in terms of actual deployment, there are two that I'm aware of, add guard, who have been around for a while, and there is a separate service run by next DNS. There was also the first academic study of DoQ last year, and that actually identified 1,200 DoQ resolvers that would answer when they did probing and that's quite an interesting study on the current state.
In terms of recursive to authoritative, there are still lots of ongoing discussions about how various authentication mechanisms would work in that scenario, but there is a proposal for how to do unilateral probing, and DoQ lends itself to this in the same way as DoH does in that you can just probe port 853 to see if a server answers, and it looks like there is some interesting use in DoQ to experiment with unilateral probing recursive to authoritative.
There is a little bit of ongoing work in that there is still ‑‑ it would be nice to do some more open padding. We are still using experimental specification to pad encrypted DNS traffic. Some implementations observed in the wild and the academic study that I mentioned were not fully optimised and weren't quite achieving 0RTT when they should have been. That was due to what appeared to be implementation issues. So, that can be looked into. One the key things with DoQ is that we're still looking any performance measurements that directly compare it to DoT and to DoH, and I think this will be particularly helpful to encourage people to consider it for recursive to authoritative deployments.
So, in summary, DNS over QUIC is now an IETF standard. There are several stub‑recursive DoQ deployments in use today. It looks like a likely candidate for recursive to authoritative experiments using probing. If you want more background on DoQ or on general work on DNS encryption in general, you can have a look at the DNS website. That tries to maintain a more recent development in terms of standardisation.
With that, I thank you the audience for putting up with what I understand was some tricky audio at times and I hope we have time for a few questions.
(Applause)
MORITZ MULLER: We do have time for questions. Are there any questions? Yes. One.
SPEAKER: Carsten. Sara, thank you for this presentation. And sorry that I didn't use the app, it didn't work on my devices.
My question is: For DoT, for ‑‑ for DoH, for DNS over DoH it's possible to use web servers software like engineX to proxy betweem HTTPS and DNS. Are you aware that this is also possible or do you know if that is possible with DoQ as well, because nginx supports QUIC already and it might be possible to have QUIC installations on that.
SARA DICKINSON: That's a great question. Yes. So, those proxies do support QUIC. The last time I looked, which was a few weeks ago now, they didn't yet have native support for doing DNS over QUIC. I would hope that that is coming. I wrangled with the configurations, I couldn't find an easy way to do it, so I will be planning to open an issue with implementers just to raise awareness that now is a standard people would be interested in trying that out using those proxies, yeah.
SPEAKER: Thank you, I will play with that and if I find a solution, I will let you know.
SARA DICKINSON: Thank you.
MORITZ MULLER: We are one more question in the queue by Brett Carr and then also one question in the Q&A.
BRETT CARR: Good morning. Sara, thanks for the presentation and congratulations on the RFC, I think it's been a long time coming and I have seen bits of these presentations several times before and it's always interesting to see.
It feels like DLQ is a bit like the second coming for the DNS, it solved lots of problems we have had for many years, and, you know, in your presentation, you talked about where the areas of DNS where you thought it was suitable, which seemed like all the areas of DNS but I just wonder if there was any areas of DNS where you thought that DoQ was not suitable, or less likely to get deployed?
SARA DICKINSON: I can't think of anything immediately in terms of actual specification, I think the challenge will be the practicality of implementing it, because I think some people have already mentioned, the ease picking up QUIC libraries and integrating it with existing DNS software could be a bit of challenge. So, I think it will depend on how much work that actually turns out to be in practice for some of the major open source implementers. It's not quite as easy as just picking up OpenSSL and putting it in, you have to think quite hard about what library, what ‑‑ what the API looks like because they do vary quite a bit so I suspect there will be an initial implementation hurdle. But once that is overcome, I really do hope that we will quite rapidly see performance data that makes clear that that is a cost worth paying to deploy this.
SPEAKER: Thank you.
MORITZ MULLER: Before we go to the last question Sara, could you try to fix your audio once again. And in the meanwhile, I hope you can hear me still. I will read out the question by Alun Burns. "Does DoQ support updates as well as queries?"
SARA DICKINSON: It does. Have a look at the draft though because there are some advice about use of 0‑RTT for for updates as opposed to to queries because there is a potential there, yes, there is no reason you can't do updates over DoQ as well.
MORITZ MULLER: Thank you. With that, another round of applause for Sara.
(Applause)
And we'll go to the next presentation by Petr Spacek about DNS Catalog zones.
PETR SPACEK: Hello, I work with basically all things DNS. In this talk we will briefly look at Catalog zones, what it is, what problems it tries to solve, why is it interesting?
.
So, first the problem.
Well, it's generally the management problem of DNS, because the protocol just don't have any management built in whatsoever. And if you have a lot of slaves or secondary servers, as we call them now, it's harder and harder with the number of zones you have, so for single zone, it's simple, in case of BIND it's like this name of the zone, IP address of the primary server, that's it.
As the number of zones increases, it gets messier and messier and you add servers, it's even worse, and at some point it's just unmanageable.
And if you have lots of zones or lots of secondary servers or frequent updates or any combination of these three, it's just mind blowing.
It's not a new problem or anyone who had that problem or has a solution for it, the great advantage is it actually works so it solves your problem but the disadvantage is that it solves the problem only for you, and as soon as you want to modify it in any way, say migrate to different DNS implementation, or add another implementation to the mix, so you don't rely on one vendor but use two, you have to modify auld the magic scripts which do the management. Or, if you want to say ‑‑ use an external party to serve your zones as an update, it's even worse because external parties usually not buy our scripts.
So to solve that problem, it's actually pretty hard. DNS vendors were thinking about the generic DNS provisions like over ageing protocol for a long time, many people in this room tried and failed over years, but the thing is that if you really focus on one specific problem, in fact limit the scope, it becomes much more manageable and that's what the the could the logs try to do.
Essentially the observation here is that on the secondary server, typically you have a huge list in the configuration file, but almost the only variable in the file is the zone name, which is changing and all the rest is almost the same. So let's try to use that.
Well, we are DNS engineers and as the old saying goes, if you have a hammer, everything looks like a nail. If you are a DNS engineer, you know zone transfer, so everything looks like a DNS zone. So, this is how we arrived at the so‑called catalogue zone, which is a like DNS zone which contains list of zones. It's very weird when you hear it for the first time, but the reason why we are going in this direction is that once we have data in the form of a DNS zone, we can use zone transfers, notifies and all the mechanisms we already have built into the servers. So doesn't require lots of new code. It's actually proven technology, so, we can try to use that.
The obvious catch is that it's kind of confusing because it's configuration formatted as a DNS zone, so it's not actually a DNS zone. There is nothing ‑‑ no DNS client actually querying this zone. There is no client which is supposed to get the records. It's just the configuration for the secondaries.
But, let's get over that, it's just one zone which will rule them all!
.
So, before we dive into specifics, let me warn you, there are two versions. The version number 1 was used by versions of BIND, it's still supported but you really should use the Version 2. And the reason is that the Version 2 is actually a joint work of multiple vendors and it can actually work with different implementations at the same time.
Approximately a month ago, during the IETF in Vienna, there was a so‑called hackathon, and during that we did an interpretative testing of a different vendors and we verified that actually the Version 2 as implemented now works between the latest version of BIND released just yesterday, Knot DNS version and also the upcoming version of NSD.
I have to say that not all names are on this slide. So if your name didn't fit on the slide, I am sorry, but it's not personal, it's just a small slide. It was a lot of people working on this for a long time.
If you open the documentation, don't be surprised there are cats all over the place just because we can.
Finally, the technical part.
What it actually is.
Again, it's a configuration file but it looks like a DNS zone. So, it has to have some zone name. It can be completely arbitrary name, just use something which doesn't clash with anything you use in the DNS tree. It's no client is going to query for it, so the name is completely ‑‑ you know, can be completely random.
Then, again, it's pretending to be a DNS zone. So it has to have SLA records, it has to have NS records but they are not actually used for anything useful. Then the important part: Version number. That's really important. Because there are two versions. If you forget the version number, it might accidentally work, but it's bound to break on upgrades, so just put the number there. Don't forget.
And finally, thank you for bearing with me, we arrived at the last two lines, which are the reason why we are here. It looks like beta records but it's not actually beta records. It's the way how the catalogue zone lists the the zones which should be secondary to the servers. Here we have domain dot example and domain 2 dot example and these two zones will be provisioned to the secondary servers automatically. At the beginning of the line, we have some random IDs. It can be random, it doesn't mean anything, it's just internal use in the protocol, just make sure that they are unique and that's it.
So, we have specified a catalogue zone in the term of a text file. Now we have configured the servers. First the primary side, and luckily, there is something special, literally, it's just a zone because it pretends to be one. So, you just configure name of the file, the type primary, name of the catalogue zone, that's t of course the primary has to provide the data for the domain or zone you want to provision, so, the configuration is all the same, nothing special here.
If you go to the secondary, it's a little bit different. Again the beginning is the same as usual, name of the zone, name of the catalogue, IP address of the primary server, which contains the zone. Then the secondary will fetch the catalogue, but from the perspective of the DNS protocol it pretends to be one. There has to be a magical configuration here which enables the ‑‑ for the zone. Once the zone is put in, the server will look inside the zone, find the PTI records and based on the PTI records is going to provision the secondary zones. And of course, the name of the zone is not sufficient only. Because the secondary server has to know where to get the data for the secondary zones and that's the reason why the IP address here specifies the address of the primary, where the zones listed in the catalogue should be transferred from.
Okay, so we have configuration. How does it work in practice?
Let's start in a situation where we have the primary server configured with a catalogue zone, secondary server configured with a catalogue zone but no other zones are in the catalogue yet.
So this is the step 0 basically, and then we provision first zone we want to serve from all our secondaries to the primary server, so we modify the configuration, the primary in the text file as usual, call whatever command is needed, and the primary will reload the zone, nothing special here.
Also, the primary will send OTA notify messages, that's the first line here. But, at this point, the secondary doesn't even know about the zone we just provisioned. So it will probably log I'm not authoritative for this. Go away.
Then we modify the catalogue zone, we will add the PTI record with the named zone we just added and then again the primary will send out note fires for the catalogue zone but the catalogue zone is actually configured there, that's the line number 2 there, and it will transfer the catalogue zone as it does with all other secondary zones and then it will parse content of the catalogue zone, notice a new zone, which is line number 6 over here, and then once the zone is read from the catalogue it will immediately do a transfer from the primary. So it will transfer the zone so the secondary and start serving it and that's it. So with this setup we didn't have to touch the configuration file on the secondary, but when we
CHRIS ADAMS: New zone. That's it.
.
As I was mentioning in the beginning, this protocol actually inter operates so we can use different implementations the secondary side it will just work. This is configuration for Knot DNS, and well essentially, it's the same. I mean it has to contain the same information, it's just different format and that's it.
So again, we have name of the catalogue zone here. IP address of the primary server to get the catalogue from and the especially imaginable option which enables the special interpretation over here, catalogue role interpret. That's again instructing the server to interpret the PTI records inside the zone in a special way.
Exactly the same as for BIND, we need an address for the secondary where the secondary can get content of these zones listed in the catalogue, then using this template. Using this special Knot DNS configuration format.
In practice, the log is again saying the same information using different terms. On the line number 1, we get the notify for the catalogue zone over here. Then it will transfer the catalogue zone as usual, and then again, it will note this that ah, here is a new zone in the catalogue over here on the line number 4. And then it will provision the zone to the secondary, transfer content of the zone, start serving it, and that's it.
It looks simple now. It didn't look like it when we were implementing and developing the protocol. Anyway!
.
This is catalogue zones Version 2, use please Version 2, forget about the Version 1. It's actually inter‑operable between the two versions. Support for it is coming, I am being told. If you are interested in more details, there is a full specification of the protocol available, the slides have, you know, a click the link over here, and if you want to discuss with people behind the protocol and behind the implementations, you can click on this link and it will lead you to the DNS OARC chat and in the chat you can find all the people in the protocol. It's opened just using your e‑mail and that's the only requirement.
That's about it from me. Thank you for your time and attention and hopefully we have sometime for questions.
MORITZ MULLER: Yes we do. Thank you.
SPEAKER: A little bit more Version 1 because we might be BIND user our using catalogue zones and if you are still using for Version 1, indeed update to Version 2, but also the existing cost of properties like allow query primaries, you have to move them to the exterior property. There is an ex XD so all those things will still work but you have to move to them under that label, according to the specification. So that's for your migration plans. A little bit of information.
PETR SPACEK: Just read the docs.
SPEAKER: Lars Liman from NetNod. Could you explain why you decided to overload existing record types with new semantics.
PETR SPACEK: I can't explain that.
SPEAKER: Who can? Someone came up that idea.
PETR SPACEK: It's in the myths of the history and that's exactly why I put the link over here. You can express your opinions.
SPEAKER: All right, I will. Thank you.
SPEAKER: Yet Jenna, I actually had an original question about that as well. It sound like something that could use a cat's clause even. It's nice to say you have something that's very useful without going into the camel hole of doing a full name server control thing through zones. So kudos.
But can you also ‑‑ the example you use, you use XFR, can you also tell it to use IXFR.
PETR SPACEK: The zone transfer is all the same. The only thing which is different is interpretation once you have the zone. It's no different from the perspective of trends.
MORITZ MULLER: One question remote, or more a comment. "If you are an operator and you are interested in sponsoring the implementation of catalogue zones, please reach out to me" that's Peter from Dike. And then another question, actually a very long one.
"Can secondary support multiple catalogue sources?"
PETR SPACEK: Yes, it's normal zone, so you can have as many as as you want and you just declare in the configuration that this is the list of zones which have this special magical behaviour.
MORITZ MULLER: And the second question: "What is the impact of changing the relevant ID associated to a given zone? Is it a convenient way to flush it at the secondary, for example trigger XF4 whatever the previous source serial?"
PETR SPACEK: Yes, it's basically the same as if you remove the zone and did it again.
MORITZ MULLER: In terms of time I will skip the last question and leave the floor to Niall.
NIALL O'REILLY: Two comments. I think the answer to Lehmann's question is Paul Vixi. I seem to remember long years ago mails from him about this concept and I think that's what made it into Version 1.
The other question is: Is it the case that implicitly in the, if you like, payload zones, the primary is the same primary of the catalogue?
PETR SPACEK: Not necessarily. If I manage to go back enough to the slides. There is different configuration for the catalogue zone itself and for the zones listed in the catalogue, so you can have different sources for these two sets.
MORITZ MULLER: One more last question by Brett. I'll ask you to be brief.
SPEAKER: I will try to be. First of all, Niall just said about it being Paul Vixi's fault. He floated this idea past me back in I think 2004, and that was in there then, it's been a long time coming.
Secondly, I wonder when the secondary picks up new zones from the primaries about the catalogue, does it write those into its configuration file? I am presuming it doesn't, and if it doesn't, is there a way for me to get a list of the currently configured zones?
PETR SPACEK: That's something the protocol doesn't prescribe that's implementation specific. I can talk only about BIND. In the case of BIND the zone will get written to this so we have file some disc and have a look and work with them using the RMDC interface as usual.
SPEAKER: Okay. And secondly, very quickly, I wonder if there is any way within the protocol, or within the specific implementations for there to be some concept of approval of a zone before it's added? So, for example I might have a relationship with a primary provider who I give permission to add new zones, but I don't want them to suddenly add 500 new zones at once?
PETR SPACEK: Okay, that's again something that the protocol doesn't prescribe, it can be implementation specific thing for rate limiting or whatever you can.
SPEAKER: Is there any concept that have in the current implementation of BIND?
PETR SPACEK: No it's not. It's just normal zone or whatever is in the zone will get loaded immediately.
SPEAKER: Okay. Thank you.
MORITZ MULLER: Thank you.
(Applause)
.
Then we have Joao.
JOAO DAMAS: Hi. I'm working with Geoff at APNIC doing the experiments on measuring the Internet. And today's presentation I'm going to report about which router host people use particularly in the context of this. I don't know if how many people are aware of DNS for EU, I don't know what you call it, initiative? Okay. Good.
The text that's there is probably not what they said, but I think it's what everyone interpreted, and that's what, at least, I had in my mind when I started hearing about this DNS for EU thing.
I understand why they might be thinking along those words even if they are not speaking them out loud. But anyway, if this is the premise that's behind the whole effort, how true is it? There a problem?
.
So we decided, Geoff and I, to go and see what resolvers people use and it's like what does this mean? How can we measure it?
.
So, as you probably have seen throughout these last years, we have these experimental things where we used a placement by Google's platform, which is, which has pretty good reach, I think currently there is only one country that stopped getting adds since late February. But everywhere else you can get add with Google. In principle what this let's you do is the very simple activity, which is the JavaScript that embed in the ad and perform a web fetch. Each web fetch comes a DNS resolution queries, right.
So that's what we use. How do we craft this? There is a few prerequisites that we need to get this done properly. As you can imagine, when you, as a user, query something like quad 8, Quad9 or quad 1, that's not the address we see at the authoritative servers as being the source of the queries. That's only the address you present to the end user.
So if we are going to find out who is behind which DNS query, the first thing you do to do is assemble a list of the IP address that is those engines used from the actual queries. Now, people like Google, for instance, very helpfully provide pages where they list those addresses, other people it's harder but if you do the homework, you assemble this list and then you can use it.
Once we start seeing the results of the experiment, we tended to classify them. It's the resolver we see asking in the same AS number of the user that asks the question that we sent the URL that it had to fetch. Does it belong to an open resolver? Is it geographically nearby, like in the same country or is it a completely different origin?
.
So, when we did the first iteration of this experiment, what we observed is that in about 73%, three quarters of the cases, we saw one query by the client generating one query by one resolver. But on average, if you count them all, for each experiment, we see for each DNS query, URL fetched, sorry, we see 1.65 queries from distinct addresses and this has a little bit to do with how clients wait for the response to come back and how impatient they are sometimes. But also, how these open resolvers use their pools of machines to do the work, and how their internal communication or lack of internal communication sometimes triggers more than one request.
So in the first iteration, we took all these queries and depending where they were coming from and just categorised them and added them together.
.
The question then that arises, okay, we see some people using more than one resolver. Can we actually design an experiment that flushes all the list of resolvers that people have included in their list so. We adjusted the experiment. We rewrote the DNS server we use to have one experiment type where the response that the client get, the resolver gets is always ServFail. If the resolver gets ServFail it will communicate that back to the client and the client will be basically forced to iterate through the list that it has and in that way you shake the tree and get the full list for every client what is their list of configured resolvers.
And what do we see when we do that? Mostly ‑‑ three quarters of the people that use the resolver that is on, half configured the resolver on their ISP, the ones that you get configured by using the ISP. Then 26% of people have Google configured as in their list. And the additional 9% of people have resolvers that are in the same country. And then the fourth category was CloudFlare, which was at 6%. The graph shows you the evolution since the experiment and all this data is available on the stats dot labs dot APNIC dot network's site if you want to look at it. You can easily go there.
So, what this previous iteration of the experiment, the second iteration of the experiment gives you is all the list of configured resolvers and the users machine. That doesn't mean that the user gets to use them all the time, or even at all. TTL thing is that the it use only the first one. So the fact that you could potentially use all these resolvers doesn't mean that you will actually use them, at least in your operation.
So, which resolver provide the answer that the user will actually use? So, we went back to the first experiment, but instead of collecting all the queries and adding them together we said okay, let's filter out only the first, select only the first answer that we provided the user and take that as the one that the user is actually going to use.
If you do that, you get slightly different results. Same AS, so the ISP provided recursive, uses about 71 errors. Google is down from 26 to 15%. In the first iteration, I don't remember seeing the graph, it was around 16%. People who use resolvers in the same country, around 7%, and CloudFlare also goes down a little bit from 6to4%.
So, in summary: Those are the numbers we got. And of course when you see these numbers and you have to define, or someone asks you what's X someone's market share? Of course there is different answers depending on how you define the question, right. Is it 26% for Google or is it 15? It depends on what your question is the answer will be different.
There is also the question of who is question when we say the resolvers we use, who is we? The ads, they display everyone that gets online basically, because that's one of the good features of Google, at least from our point of view. If you are on the receiving end of the ad, not so much.
So, usually, the concerns of the European Commission have to do with consumers, with individuals. Not so much enterprises. Enterprises are usually free to do whatever they want within their little regions of authority.
So, how about we take the previous data and we cross correlate these data with AS numbers that we know belong to consumer networks. So the typical home ISP, right.
In that case, what we get is different. The people who use the same, resolver in the same AS, usually they are ISP provider, goes up to 87%. Same CC, so not the same AS, but the same country still: 6%. As I say one thing that we observe is that there are several ISPs out there, particularly big ones, where that separates the access network from their internal services network, so the two ASes actually belong to the same organisation, it's just how they organise the network, right. So, there is a very high likelihood that the same CC actually means the same company, just a different network belonging to the same ISP.
And Google goes down to 4%. This is again home users, and CloudFlare to 1%.
So, what this is showing is that for consumers, people like you and I, when we are at home, actually the vast majority use what their ISP gives them, and in that sense, you have to wonder if there is a problem.
The picture for enterprise as I said before is different and you can see this from the weekday/weekend. It was also very visible at the beginning of this whole lockdown period when everyone suddenly started working from home rather than the office. There was a clear shift in the graphs showing that when people stayed at home, Google usage was down. So it's things pointing to it being the enterprises, or enterprise networks that actually use open resolvers the most.
So there really a problem then in the context of DNS for you? Well, it depends who you are and how you phrase the questions, right. As we said before, the majority of users at home use what their ISP gives them. They don't give their data way. Probably they don't even know that there is such a possibility.
But, there is increasing concern on the open servers, open recursive servers becoming centralised points of DNS operations.
So, it's not yet the problem, but it might be, what can he possibly do about this?
Perhaps there are better ways than the approach that the European Commission was proposing, which came with a lot of things attached, and basically it created a different set of centralisation problems rather than eliminate them, and throughout all these years it's been known to everyone that the Internet works best when it revenue regulates, when it's the people who run the networks on behalf of the users that do the self organisation.
So, how about we think of a different approach. If there is going to be a problem, if there is going to be a concern, there is room for us, the people who run these services perhaps, to come up with a set of operational practices that people could subscribe to, adopt, and point to as a reference as this is how we operate our network, and then anyone who puts up an open resolver service could just subscribe to these as this is how we do things.
Traditionally, RIPE has been a rather good venue for coming up with these sets of rules, because the people who run the services are actually ones who attend RIPE.
And RIPE has had, in the past, ways to do this kind of thing quite quickly in the form of task force.
So, that would be my question to the Working Group: Is this something that has any sort of ‑‑ does this raise any sort of concern first? Is this the proposed approach something that you see feasible? I think I have seen within the centre community already some loose threads about how about we define what we think is a good open resolver. So of course cetnre is also well presented here, and they will be welcome to participate. Is this something that the Working Group sees as a work item that it can adopt and then publish whatever the outcome is? And if so, would a task force be the right way of conducting this quickly or do you prefer to just have the whole room participating at the same time?
.
Thank you.
MORITZ MULLER: Thanks Joao.
(Applause)
AUDIENCE SPEAKER: Hello, Patrik Tarpey from Ofcom. I really interesting study and something that I have been rolling around in my head about this. But the emergence of oblivious technology, and in particular oblivious DoH and lately we saw the release of Apple private relay, if those are, shall we say, more widespread in their adoption and, you know, became ubiquitous, what would you expect to see in terms of the results. My initial reaction is Apple private relay kind of MED it's a the ISP resolver server, so would you expect naturally to see more traffic emerging from, you know, the Apple private relay partners and in general, oblivious?
JOAO DAMAS: Yes, I would probably ‑‑ we would probably see an increase in the the localised traffic. In fact we are already seeing a bit of that. Partly from Apple's deployment of this thing, even if it was supposedly better, but it's actually working. But also have VPNs and we have to do some manual fudging of the data carefully, because we are increasingly seeing queries coming from places that look that I can data centres, which don't have any real users, we know that, but are end points for these sort of VPNs which are also becoming more and more popular. The problem with VPNs is there is no two stage decoupling of who is asking and what they are asking. You are just shifting your problem so someone else who probably don't even know who they are. So the initial approach I hope, it will catch, and it will invest perhaps some of these problems, it will convert it in a different problem. For instance, Apple uses I think CloudFlare as their second relay, and although you decouple the two pieces of data, I wonder if that will be enough with people's concerns.
SPEAKER: Just a supplemental question: Are you planning to rerun it again, this series of testing?
JOAO DAMAS: It's constantly running. So they are running everyday and the data is being added to the website and we will tweak the experiment as we see how the evolves.
SPEAKER: Thank you very much.
MORITZ MULLER: I have to cut the queue now and therefore the people in the queue please be brief.
SPEAKER: Petter Hessler: I have an anecdotal piece of information for one of your questions you asked, which was when you did the ServFail test where you saw a huge amount of traffic going to Google and then not so much with the so‑called actual testing, I have seen on a handful of networks where they hand out their own resolvers and then also Google public DNS. So as a fallback in case their own fail. And so that is potentially ‑‑
JOAO DAMAS: It's exactly the behaviour we see. Like I am not faulting them, we don't want to be calling them at four in the morning.
MORITZ MULLER: One comment from the Meetecho. He says thank you for this presentation, just to comment I think so the initial statement could be just part of the story because among the DNS requirements there are GDPR requirements, by law filters, option at parent control and wholesale parent services.
JOAO DAMAS: Yes, and that's a little bit of concern there. There is a lot of wording there about control.
SPEAKER: Hello. Thank you for the presentation. I think we need to rethink and reframe the whole question and see even if that's such an effort is valid, because name space is a shared and limited resource, so that's of course there, but the whole infrastructure discussion is about resolution, which is not a shared, not a limited resource, it's basically more options for people and the first question I think is will you or anybody have the right to regulate when there is free choice already for citizens and if I want to go to a website I have every choice to switch to Google or whoever and I can switch back. So that's I think first we have to start from there so that would be very good starting point for such a task force, I fully support having a task force on that.
Because I think we are trying to regulate somewhere the regulator doesn't have the right to regulate, so setting entering that control and we are forgetting that it would create a set of elite people who say citizens of EU don't understand and we decipher. Where we have every choice and of course one person might not make a change but if something goes wrong and my resolver doesn't resolve I have a choice to switch to another one. If I worry about my data I might not want my European resolvers to have my data. And nobody gives them the right to regulate where there is not a shared and limited resource. I think we have to start discussion on it.
MORITZ MULLER: From Meetecho. Marco d'Itri says the actual question is why are consumers using non‑ISP provider resolvers which have down sides? My non‑scientific guess is this is happens because their ISP resolvers are A) censored B) provide unwanted ads and C) are unreliable.
JIM REID: Nice work as always Joao and Geoff, the stuff you are doing is fantastic. You had a suggestion about doing some kind of BCP or something along those lines is a good one. The problem I have got with the Working Group is we don't have a lot of representation from big operator networks that have got a large number of eyeballs using big infrastructure. We'd need to have more of that kind of input if we were to produce such a document and if we could encourage those people to come here, that would be great.
.
The second point I think is more of a layer 9 plus issue which is, to me, the objectives of DNS free use seem rather unclear. We're not really sure what the intentions of the commission have about this and what the long‑term goal is going to be. Is it going to be a regulation at some point in the future? Is it going to be an optimum service? Are they going to police it? And if so how? Those are questions I think need to be asked and I think some settings challenge the Commission and what they are thinking that is. That's sort of a more of a role for the NCC as government engagement folks but we might need to start articulating what those questions and concerns would be, that might be something else this Working Group could take on.
JOAO DAMAS: This suggested piece of work doesn't preclude the governance side from having its own. It's just making it easier to ‑‑ you don't have to think so hard about this if your stated goals are your real goals because here is an alternatives that was actually voted by the industry.
JIM REID: Just one other point ‑‑ there are a bunch of now these so‑called protective DNS services that are being used particularly in the public sector in various countries, maybe that's another part of what the EU is thinking. I don't know.
SPEAKER: Benno Overeinder. NLnet Labs. The good thing after Jim he says a lot of sensible things so I can be short. I think it's good to think about the task force. So this is a plus one for me. And I'm not sure we will be successful, again reiterating what Jim said we also have to find other things. But I am happy to help, give input etc. So I think it's good. And it's also reflects my discussions with other operators or people at the centre, yeah. Thank you for the presentation.
(Applause)
MORITZ MULLER: Now, we have another remote presentation, from Adiel about kind DNS.
ADIEL AKPLOGAN: Thank you very much for having me for this session. We'll start by saying I'm not David Huberman, I am doing this presentation for him, but this was an initiative that I am leading at ICANN.
This is something that we consider as one additional tool to promote best practices and help secure the DNS ecosystem in general.
Next slide please.
So, this is called KINDNS, this is knowledge sharing, initiating norms and DNS security in general.
The idea behind this is to work with the community to streamline best practices that we consider critical for secure DNSA operation, we want to focus to the operational aspect of this, and this came basically from feedback my team and myself have gotten from the DNS operator in general. You all know about the camel right initiative that notified there is modern 200R FCs that talk about the DNS, and I will try to ‑‑ and more than 200 pages to understand the DNS in general, and our goal is not to make everybody DNS expert, but the idea here is to be able to identify what are the most critical practice that we can encourage operators to implement, being big, small, private, public, to operate the DNS that is secure and that does not become the weakest link of the overall system.
We, then, start looking at the different document that talk about DNS best practice. Also, talking with people who operate DNS actually on the ground, and identify practices that we can promote, practices that we can highlight. We usually say, you know, the 20% practices that can help us achieve 80% of a secure DNS operational environment.
So that is where this idea came from, and we started working on this since last year, via mailing lists and working with the community and an external expert as well to help us streamline those best practices.
.
The key component of the initiative as I mentioned first was to identify what are the critical best practice that is we want to focus on, engage the community on those practice, refine them, and also allow them to evolve. This is not something that is cast in stone forever but it will evolve with the protocol, how the protocol evolves and how the operation at practice evolves.
What we identified, we got the set of practices, published them using a website for that and add a component that is the self assessment component as well. That self assessment component with will a now new operators to asset themselves against those practice, see where they fit, access the implementation guideline that we will publish as well so that they can correct what they are not doing well, read more and implement. And after doing that, or after getting, you know, through the self assessment, they can enrol and participate to kind ‑‑ and participation means becoming Ambassador of those best practices, promoting them and also, you know, voluntarily self committing to implement them and keep them operational on the DNS service in general.
Then, we will also have a world where we will identify few indicators that can help us see if these practices are having positive impact on the, I will say, health, quote, unquotes, of the DNS in general. That will be a continuous measurement. We will try to streamline those measurements and try to focus on what we can measure from outside, but also involve people who are participating into this to some measurement that they can do in general, because obviously we cannot measure and see everything perfectly from outside.
So, those are the three components we are focusing on right now. But, the idea is to lay down ‑‑ extend this to the, you know, the procurement part of the domain name which is more related to ICANN mission, which is the registry, registrar and registrant best practices.
So, we have identified in total five categories of DNS operators that we will ‑‑ for which we will document, you know, practice. Within the authoritative operators, we have operators that manage TLDs, or critical zone, and here critical zone, I mean a participant will self assess that themself as critical zone manager but what we are putting in here, a zone that are critical for the management of the TLD itself like NIC dot TLD for instance, is critical for that TLD, or any other second level domain that, you know, related to health or banking or insurance, you know, can identify themselves as part of the critical TLD that are obvious.
Then we have the single domain manager, or dead level domain, depending on which TLD we are talking about. That will have their practice as well. And then we have the resolver, and there we have three different categories that we are covering.
The first one is foreclosed and private resolver operators. That's mainly will cover, you know, resolver that are run within companies which are for exactly the user of a specific company, or a specific community.
We have shared and private. That is mostly those who can see at ISPs, they are private but they have shared ‑‑ a limited group or known group of users. And then we have a public and open resolver in general.
So for each of these categories, we have identified a set of practice that can be applied to each of them.
Now, at the bottom of that we have also a set of practice that are ‑‑ that goes beyond, you know, operating the DNS itself, that is about the system. This came in not as part of the core of KINDNS, but it's obviously that if you don't have a strong platform on which you are running your DNS, you can secure the DNS as much as you can, the overall security will not have effect if your core system is not secure.
So we have the hardening of the core, that is applying to most of these different categories. And operators can self assess themselves, again some of those had in the core practice as well.
So, basically, those are the categories that we are covering, and as I mentioned, an operator can join or participate to one or several of these categories, and by doing that, it's a voluntary engagement.
Of course when we started this, there has been a little bit of question about the relationship between this and ICANN remit, which is more about policy relative to DNS registration, and as I mentioned, we are not touching the procurement part of the domain name here, but more the operational part, which makes this a very voluntary engagement and commitment by operators in general.
So, we'll quickly go through some of the practices that we have identified so far and which we are working on. For instance, for the authoritative, DNS operator for critical zone, we have so far identified eight practices that are going to be documented and which we are going to request any participant to adhere to commit to compliment DNSSEC is one of them of course, making sure that the transfer between authoritative server is limited, so having a way to ensure the authority of the zone file, authoritative server must ‑‑ and recursive must not run from the same server. Those are a simple thing that, you know, anyone that runs DNS seriously know about, backup we're trying to frame this in a way that people can quickly have access to those practices.
For instance, for TLDs and critical zone, we are suggesting that there are a minimum two distinct name server, they must ensure that there is diversity in their infrastructure, including software package they use on the different authoritative server they run. And they also have to ensure that they are monitoring their service continuously.
I have three more minutes.
So, we have this set of practice for each of the categories I just mentioned on the previous slide. For SLDs we have seven practice. For the closed and private we have a set as well. So for each of those categories we have identified our practice and our goal was not to, you know, go beyond ten practice for each of the categories, so that we don't overwhelm operators with, you know, too much, too much practices. That doesn't mean that we are not going to talk about all the practices because we plan to have, you know, a section of the website that talks about (SLDs) other references other things you may want to do.
We have eight practices that are pretty much focused on the system in general.
So, the self assessment anden rollment. Anyone can take a self assessment. It's going to be open and available for anyone to take and based on the outcome, you can now decide to join KINDNS or not.
The self assessment will generate the report that anyone can download. The report will also point to the practices, a guideline on how to implement practices where you feel that you are not implementing yet. That is just a guideline.
And it can be fully downloaded after you take the assessment in general.
So all this is going to be published on the dedicated website that is being developed right now. Our goal is to launch this sometime by the end of this month if everything goes well. And right now, what we are doing and what is happening is mainly it's stream lining the practice, and we have a mailing list for this that will invite you to join if you want to contribute to this, which is KINDNS‑discuss at ICANN .org and there is a WIKI page we maintain where we put our information relative to this temporary until we have the formal website.
Thank you very much for your attention and I'll be very happy to answer any question relative to this initiative.
MORITZ MULLER: Unfortunately due to time constraints we cannot take any questions, I am sorry. Now the next speaker.
FLORIAN OBSER: Hello. I am Florian of the NCC and I will give you the DNS update.
Hosted DNS. This is where we, with the help of the community, expand our footprint of the NCC Anycast and k‑root. We have been doing this for k‑root for many many years where we have 87 hosted sides. They cover about 50% of the total traffic of the k‑root. Of DNS, we introduced recently, I think about a year ago, and there we have a total of 7 instances since the last RIPE meeting we got 3 more, one in Salzburg, Sarajevo and ...
.
Traffic statistics look like this, they cover 10% of the total query rate. To do the math, we're seeing about 120,000 to 160,000 queries per second.
Next one, DNSSEC, we decided to switch to a combined signing key, and the reason for that is the split of a KSK and a ZSK is just more moving parts. The reason for that is the both keys were stored together on the signer, so if one key is compromised automatically the other one would be compromised. We didn't want to do that. The other thing is we go Algorithm 13 the key site is the same. Previously we had algorithm 8 with especially the ZSK had an RSA 1024 key choice, you need to actually role these quite fast.
Algorithm 13 gives much better and stronger encryptor, we should be fine there as well.
And to switch to a combined signing key, you do you standard KSK rollover, the Knot DNS sign up is just one configure option. We tell it to do its magic, it waits for us to update the DNS records in the parent, and well this was a non‑event which in DNSSEC, it's a good thing to have that. It was really boring.
If you look at the zone it looks like this. You will see that there is one DNSKEY that signs the DNSKEY and also it's the same key that assigns as well as all the other records that are in there.
A request from the community was to, could you please lower the TTL on the NS, the delegation records and on the DS records. This used to be two days, we lowered the NS record to a day and the DS record to an hour. We did not see any spikes or increase in a query rate on theality DNS cluster, and especially for the DS record, a very high TTL, if you are doing a normal KSK rollover that's just annoying, if you have an event and you need to wait two days, that's really, really bad. So this should improve things.
Software update, we're using Zonemaster to do a pre‑delegation check. There is a new version out, you get bug fixes and you get a few minor additional checks, and what you really were looking for was support for algorithm 15 and 16. So we have support for that now.
So, have you recently disabled IPv4 on something? We did. The reason for that was we were actually running out of space on management interfaces on one side and we looked at this, how can we solve this? Oh we could run this on private space. Hang on, there is IPv6, let's do that. And well it just works. Except in some sites where we only have v4 and we need to poke people or where v4 is actually better than v6, we need to work on that as well. But the idea is let's just switch as much as we can on the management interface to v6.
We are a bit late on the hardware replacement. Basically we can't get the hardware in, we have been promised stuff will show up in June. So we hope to be able to roll this out actually this year, these are the k‑root sites in Frankfurt and Miami.
And another project that we're working on is of the DNS Anycastically course sites in Europe, we wanted put in a fourth site where we will also face the issue of we need to get the hardware in. We aim to do this by the end of 2022. I am a bit more a pessimist, so I think it will be slightly delayed, but we're working on that.
And I think there is actually the last slide. We tried to be diverse in the software that we're running so all our name servers, we are using by a mix of BIND, knot and NSD. We have some diversity in our routing software well which was Bird and X iBGP, we dropped IX BGP an we were running on Bird for a long time. To add to the diversity we also added FRR to the mix and we need to do an OS update because CentOS is end of life soon.
And I don't know if you have time for questions.
(Applause)
MORITZ MULLER: You were very quick. Thank you, so then we have some time for questions.
No questions, I believe. So, then we have a small surprise still. So Shane, the floor is yours.
SHANE KERR: So, a little bird told me recently that he was doing some archive work, archeology, looking into the historical record, and it turned out that in June, next month, that DNS Working Group is going to be 30 years old. So, we figured we'd have a little birthday celebration. We have some cake over here. Happy birthday everyone.
(Applause)
MORITZ MULLER: Thank you, see you next time.
(Lunch break)
LIVE CAPTIONING BY
MARY McKEON, RMR, CRR, CBC
DUBLIN, IRELAND