Anti‑Abuse Working Group
Thursday, 19 May 2022
At 9 a.m.
MARKUS DU BRUN: So, good morning, everyone. It's 9 o'clock. This is the Anti‑Abuse Working Group, I thought this being the main room and this is a networking, when had this should probably be routing but if you thought it would be routing it's in the side room.
So, my name is Markus Du Brun, I will be your co‑chair for today, and Brian Nisbet and Tobias Knecht. Unfortunately, neither of them was able to attend the meeting but luckily there is plenty of staff around to help me out. So thanks to Chris for being our scribe, thank you for the stenography, that is always really helpful especially for people like me who are not native English speakers. Then of course there's the technical staff running Meetecho and providing all the tech in the room, thank you as well, and this time, there's always Anna Wilson, a colleague of Brian's, who is helping me with monitoring the Q&A and the audio video queue. So if you want to ‑‑ if you have any questions during the talk or want to give some feedback, then you can either use the Q&A on Meetecho or you can use the video audio on Meetecho, you can use the ‑‑ your Meetecho link to insert you in the queue, and then hopefully you will show up in the queue and you get your turn to speak.
We will not be monitoring the chat so if you have any questions, please use the Q&A. If you want to give us feed backs for the presentations, talks, interactions we have today, you can send us an e‑mail to the Working Group Chairs list or you can use the rating mechanism on the RIPE page, either is fine. And I think that's about all the things we have for housekeeping. So the next thing we would have to do is to approve the minutes from last RIPE meeting, RIPE 83, the minutes have been sent to the list, they there haven't been any comments so if there aren't any last minute comments now, then I would declare the minutes approved. So thank you. And that is out of the way.
There have been no last minute agenda changes so the agenda is set as well. And we can continue with an update on the recent list discussions. There have been a few topics but not a lot of discussions, most of the mails were on the abuse handling, training, there will be ‑‑ this has its own agenda point in interactions so I am not going to talk about this, and the others were on questions regarding abuse‑c and ‑ filtering, but as I said before, there hasn't been much discussion and there's nothing in particular that I would want to raise but if there's anything you would like to talk about regarding the recent discussions, then this would be the time. Again, no comments. Great. Then we can move on to policies. This is also quick. There are no policies and we can start with our interactions, the first one would be the before mentioned anti‑abuse handling and Gerard owe, you are up.
GERARDO VIVIERS: I have a couple of slides here. Good morning, I am not your quiz master this morning. I am here in my capacity as training material developer and I want to give you an update on the anti‑abuse training that we are developing on request of the Anti‑Abuse Working Group.
So, we had ‑‑ how does this work? ‑‑ I got it. So, since last meeting when we presented this project, we had two rounds of feedback that we opened up for the community to participate, we had written some learning objectives and an initial design and from the first round of feedback, I got a lot of input and it was actually well attended when we did a Zoom meeting and, with that information, we were able to improve on a learning objectives and the whole webinar design.
The second feedback round we only opened it up for online comments, we didn't have a Zoom meeting, and unfortunately that wasn't so well attended or there wasn't so much participation so taking silence means consensus, but I would really like to hear your voices, we are listening for you ‑‑ for your input, and I would like to know later on if you want to tell me, how it is that you would prefer to participate, if you would like a synchronised online way or do you want one time Zoom meeting where we can all join and talk one at a time together. Well, if you have any other creative ideas, we are open for listening.
So this is the rough outline of what the webinar is going to be like. We are going to start with introduction to the topic of abuse and abuse handling, what it is. We are going to then cover the abuse handling principles as we've heard from the community as best practices, and of course, the abuse handling process. And we want to finish with the guidance ‑‑ a guidance to implement and run an abuse desk. Now, I have been hearing a lot from community members that they want us to say you have to handle the abuse, to point the finger and say you are not handling your abuse. Now, but as much as we are not the Internet police we are also not the manic Internet preachers telling people what to do or not to do so we want to struck this webinar in a way that motivates people to care about abuse handling and give them some tools so they can actually do it, and we think that, this way, we will create an attitudinal change in people, so that they will take care of their abuse complaints.
Coming up is, well, we are working on the webinar slides already so we are putting the content on to slides and giving it shape. We are going to have hopefully, we plan for this, a first iteration by the end of quarter 2, and we are going to open it up again for feedback to the community to let us know if the content that we are using is correct and if you have any other things that you would like to add.
If you have any questions, I'm here, and otherwise you can reach me at my e‑mail address. I have no special sound effects right now. No? Good. Well, then see you online.
MARKUS DU BRUN: I do have a question for the Working Group. So how do you think we should proceed? In I think really Gerardo would like to get more feedback and on the ‑‑ so now he is working on the slides so there's probably ‑‑ now is is not the time for feedback but if the ‑‑ if at the end of Q 2, you said?
GERARDO VIVIERS: Yes.
MARKUS DU BRUN: How should we proceed? The most feedback he received during the Zoom session we had, is this ‑‑ is this the way you want to do it or... we should probably ‑‑
GERARDO VIVIERS: One at a time, please. I think an online Zoom meeting was what's more well attended than the on‑line asynchronous collaboration so I guess we will organise another Zoom meeting.
SPEAKER: Why not, why not just proceed with the Zoom stuff?
GERARDO VIVIERS: The thing is, I don't want to impose my time on other people's time so by opening it up and making it asynchronous I thought people could join in and collaborate whenever they left like it. Yes we will do an online Zoom meeting, I just wanted to know what the preference of the Working Group was.
MARKUS DU BRUN: Okay, thank you.
MARKUS DU BRUN: Next up we have Graeme Bunton from the DNS institute who wants to give us an update on the DNS reporting tool.
GRAEME BUNTON: I do have slides, I haven't previously uploaded them, lets see if I can do that on the fly. I haven't uploaded them previously so maybe I will just talk you guys through this and I can share them after the fact.
So, good morning, thank you for having me, I spoke with this group back at, I think it was RIPE 82 about this DNS abuse institute thing that I'm the executive director of, and it seems like the right time to come back and share a bit of an update.
Maybe for context, the DNS abuse institute is an organisation that was created last year by PIR, who run.org, public interest registry, and they saw abuse as a sort of complicated global problem that we were not going to solve within the domain registration industry at the individual registry or registrar and that sort of collective solutions and centralising function was necessary if we were going to make any headway on the issue and so the institute was created and here I am running it.
We have sort of a couple of key pillars that we are working on, education, collaboration, innovation, and really the sort of raison d'être for the institute is to identify the places of friction and complexity for dealing with abuse within the sort of Internet infrastructure ecosystem and see if we can move those places of friction into the institute itself to enable, you know, Internet infrastructure providers to action abuse more easily.
So that is sort of broadly the mission.
And you know, we took a pretty long look at the various activities that were available to us, and decided that one of the key pieces that was missing from lets call it the Internet in general, was some sort of centralised abuse reporting function, and that's because we see, broadly, two problems: One, that reporting abuse is hard for the people who wish to do so, especially in the domain registration side of things, you know, there's an awful lot of players, there are no standards for what evidence is required, there's no consistent implementation of the process to report abuse, there's technical knowledge required, like people need to be able to ‑ main registrar which not everyone can do, it's even harder to identify a host, so that sort of user experience is pretty brutal. On the other side of things, what's coming in to people who could mitigate abuse is those are just awful too, and I am sure many on this Working Group webinar, call, experience that too, that the abuse reports people are getting in the door are duplicative, unevidenced, unstructured, often not your domain, your IP, your infrastructure, they are unactionable so people are spending lots of time and energy triaging abuse tickets that provide no value, they don't make the Internet any safer, they are not reducing issues on your platform, and so it's just a whole bunch of wasted time and effort. And we thought that we could build something and plonk it in the middle of that and clean up a side to that problem and we are really close to doing that so we have built a tool we call it net beacon, it is launching relatively shortly, in the first week in June, and it's not coming right after the spaces that this community operates in, but it's close, and so I wanted to sort of come today and share this idea and see if I can get some feedback and open up some connections. So, the first thing to say is that this is not a commercial endeavour; this is a free service that we are providing as part of public interest registries not for profit mission, to essentially make the Internet better. So this is what we are doing.
So we are setting up a website that makes it very easy to report abuse, it's standardised forms for different types of harms. We can implement, we have implemented, standardised requirements for those abuse types that, you know, people require certain elements, you know, hosting companies or registries, registrars, whoever, for particular ‑‑ like for malware or phishing or spam and we can ensure that those elements are captured, we are standardising the reports that come into x‑arf so we can display those in human readable if that's your J M or natively, we are doing something interesting and this is where I think this tool really will provide value for everyone, we are providing what we call enrichment. So someone submits an abuse report into this centralised system, really it's an abuse immediatery and then we are enriching it so we standardise it and then we take that, right now, domain name URL, eventually full URL, IP address included and we bounce it off a whole bunch of different API‑based sources of domain intelligence so that could be like hybrid analysis from crowd strike, block list from Spamhaus, abuse dot CH, there is an awful lot of those, and we are prioritising which ones we are integrating with. So that we then have the abuse report that the person submitted but then we have appended to it a bunch of what we think will hopefully be useful information to that abuse report. The goal there is to move some of the investigatory burden from the frontline anti‑abuse person or compliance person or support person, whoever is on the receiving end of these abuse reports and move it into this tool itself so what they get is a well structured, well formatted, readable abuse report with realistically most of the information that they could go and get themselves. And we are hoping that this reduces the time to action and makes that choice of action abuse very easy.
And then ‑‑ so we have got this now standardised, enriched abuse report, and then we are automatically distributing it right now to gTLD registry and registrars because they are practically pretty easy to identify, and ultimately and, this is why we are having this conversation, hosting companies, CDNs, you know, the IP community in general, so that we can begin establishing escalation paths and routing the right types of harms to the right places.
So, a couple other features it's worth talking about. You know one is that the forms we are using to do this are pretty straightforward. My experience from coming from primarily the domain registration industry is that registrars and registries are terrible at implementing abuse forms, they have ‑‑ I have looked at them all, they are all pretty bad, it's just not a thing they are food at so that's just fine we will do that for them. And probably that's worth noting, is that not only will abuse reports be able to be submitted into this thing, via the sort of public facing web page that will be at net beacon.org but we have built these to be embedable so that anybody within the ecosystem wants to be able to get abuse reports from this form, they can embed these forms on their website, essentially white labelling them, people can fill them out, the abuse report gets standardised again and enriched again and and back to themselves, they don't have to go through the work of trying to craft useful forms and get all of this tool for free.
I don't think I need to go into the x‑arf component, but ‑‑ or at least not in any real depth but just know that it's based on ‑‑ Tobias's company I believe developed and because we are distributing it in that human readable, if you want it in x‑arf you can build it on top of it so you can get these abuse into the system, recognise them as coming from net beacon, on top of that you can automate and say it's got that enrichment or these five sources of information in it, they all flag this, you know, URL as harmful, I am going to suspend hosting or IP address or domain name, whatever it is you choose to do but you can build on top. There is a couple of other neat bits and pieces that I won't belabour those too long. Those enrichments are selectable so that you can choose which ones you find particularly valuable, maybe you don't like Spamhaus, you don't have to get them, create an account and login and choose which pieces of information you find the most useful, you can specify what end points you want your abuse reports at so you could specify a number of different e‑mail addresses, so malware could go to malware at or you can consume these reports via API and we do allow some batching so that you can batch abuse reports up to 24 hours. But really, we are trying to get people to be responsive quicker.
There is an API for submission into this net beacon, into this centralised abuse reporting function so in theory people can do that automatically. Unclear at this point exactly who we will enable access to, we don't want to flood anybody with abuse reports and not everyone is capable of producing them at scale and at quality, so we'll keep an eye on that, but there is also an API for consumption so if you want to get your reports via API you can do so I have talked about the available forms, one of the other things that is worth mentioning is that we have enabled people who receive abuse reports to labour their reporters so if you have people frequently reporting through the tool to you, you can apply a free text label to them so someone could be, you know, law enforcement or local law enforcement for example, you could give them that label inside that beacon and then that will prepenned that label to any subsequent ticket or e‑mail you get from the platform. There's also the ability to reflect a bilateral relationship so you maybe have a trusted note fire relationship with, say, IWF, that's maybe a very heavy example, pick something lighter, you might have someone that you trust very much and you action their abuse reports, essentially automatically, there's a function to enable the by ‑‑ or to flag that bilateral relationship to help you triage your abuse tickets a little bit easier.
So ‑‑ and I understand this is probably a little bit awkward because I am just talking about it and I am not showing it, but bear with me, it comes out in the next two weeks or so.
What it's not is important. It's not an abuse management tool. We are not helping you manage your abuse tickets. It's not a ticketing system. It's not trying to fill that void. It's really about getting these abuse reports into the ticketing system that you are using, which could be e‑mail, it could be, you know, an abuse‑specific platform, like the ones from clean DNS, IQ global or abuse IX, it does not make determinations for you, we are not trying to tell you what is or is not abuse. What we are trying to say here is what has been reported and here is all the information you could reasonably expect to get for that specific abuse. We are not not storing reports, we are not trying to build here a repository of all the aabuse that has been reported through the platform. We are in the final stages of finishing the data retention policy, we can make sure people aren't usually it maliciously, probably something like 30 days, we will keep aggregated statistics, but I don't think the underlying details, you would not be comfortable, we are not comfortable with all the risks those details would entail either. We are not storing for forever.
Timeline is publicly available for June 1st. The recipients are going to be limited at that point to gTLD registry and registrars, again because they are the easiest to identify. And specifically only accepting abuse reports for malware botnets, phishing and spam, and then, and this is part of why I am having this conversation here today, is we are ‑‑ we need to do a couple of things. One, we need to increase the number of harms that we are accepting for because lots of people accept, you know, abuse reports for many other things, intellectual property issues, scams and fraud, etc. And we also need to make sure that we are including the appropriate actors, and this is the interesting next step, which is, as we look at including hosting and CDNs and e‑mail service providers, that we are building the right escalation paths for abuse, so that an abuse report can come in and we could say oh, this doesn't belong at the registry or registrar yet, this should really go to the host so we can route that abuse report to the host and we are able to check and see if that harm still exists and then at some point escalate perhaps up to the registrar and then even perhaps up the registry and by having this centralised function that beacon net provides we are able to route abuse appropriately and escalate appropriately and I don't think anything like that exists in the marketplace right now. And I'm pretty excited about what that means for the Internet. I think we are going to be able to clean the Internet up in a sort of sane and responsible manner.
We need to integrate ccTLDs which are difficult, but we are working on that as well.
And I think that's it. So my ask for this audience is, you know, get in touch if you find this interesting. I'm always very interested to hear what sources of domain‑related information you find the most useful. If you are investigating an abuse report, what sources do you check and can I integrate those? I'm interested in understanding how people feel about escalation paths, what harms go where first, what do you see your role in that being? And then hopefully you know the ask is that when we're operating in your space, say if you are a host or registrar, that you would create an account and integrate and that integration is very simple. My expectation is not that people will be in every day, it's sort of a set it and forget it, you would login and claim ownership of your host or your registrar, what have you, and the entity that were able to recognise, you would set your end points, perhaps some labels and that's it, you know, the abuse will just flow. And so I think that's really all I wanted to share. I am very excited about, you know, where this tool is going and the potential to have this sort of non‑commercial centralised function routing abuse appropriately for the Internet, and hopefully you guys are too, but I will stop there and see if people have questions for comments.
ANNA WILSON: Thank you, we have an ‑‑ Anna here, we have a couple of questions. Let's go to the microphone first.
SPEAKER: Patrik Tarpey from Ofcom. You will know, Graham, that the European Commission made an announcement I think about eight days ago about combatting child sexual abuse material and probably terra content and meanwhile in the UK there's a draft bill going through parliament that does something similar around, what are described as illegal harms, including child sexual abuse, terra, how are you going to cope with the different jurisdictional asks, for example in the EU, UK you might have one regime in play for abuse where this system could be exceptionally useful whereas in other parts of the world there's more, if you like, constitutional tensions with freedom of speech, etc., how are you going to differentiate ‑‑ because it's very probable that registrars and other people in the DNS ecosystem may have some form of liability or responsibility to remove content?
GRAEME BUNTON: Thanks, Patrik. So, a key piece of this is one that it's really about disruption and not necessarily a full‑blown law enforcement investigation so if LEA and there's some other component like formal thing happening, that I have other own tools and mechanisms and this really wouldn't be about that. We are not trying to fit this into any particular regulatory regime. It's really going to be about capturing the broadest set of harms and ensuring that those go to the right people. And so I am less concerned about anybody's individual as an organisation's responsibilities, and broadly because the sort of illegal floor tends to be higher than what we are capturing, I think we are in a pretty good place to manage those harms across the ‑‑ essentially the world. Is that at all helpful?
ANNA WILSON: Next question from Daniel Mahony in ISC. "One of my spam sources is a large difficult to contact free mail provider who refuses mail to abuse at, many abuse sources also bounce abuse at or unaware of abuse at, how does this solve that?"
GRAEME BUNTON: That gets into the ‑‑ apologies, my phone is ringing and I am at an AirB&B and I have no idea how to turn it off and it's going to go on for a long time. So the ‑‑ what I think that question gets at is this escalation path and having this net beacon tool ‑‑ that phone is the worst ‑‑ literally, it's ‑‑
ANNA WILSON: Do you need a moment to deal with it?
GRAEME BUNTON: It's going to stop any moment, I'm sure, I have lost my mind ‑‑ so, I think the problem described on the abuse at and the uncontactable people is common, and you know, obviously very frustrating, and that's where having this sort of centralised function is helpful, I think, so that if we've routed an abuse complaint initially to this, someone who is ignoring their inputs, we now have an authoritative record that that has been done so, and one of the key pieces from people that's been missing is I have tried this and nothing happened; well, people aren't always honest about that, and that becomes really difficult for someone deeper in the stack to verify, so then they are doing this whole process where they are trying to notify maybe, maybe, and so that gets really ‑‑ it takes a long time, requires a lot of reverification and that gets problematic. So now, now, when we've fully baked this thing we get into a place where oh, no, we know we have submitted it to that mail service provider, we know they haven't done anything for two weeks, we know that we gave them this information, now we are escalating that further up the chain, is that their registrar maybe and now the registrar is going to take action because not only do they have evidence but they have a record that they have been recalcitrant, I think there is some interesting potential about being able to tie that together deeper into the stack of the Internet.
ANNA WILSON: Thank you. Any other questions? It I am not seeing any. Thank you very much, Graeme.
MARKUS DU BRUN: So can we get the agenda slides back on, please. Thank you. So the next one would be the RIPE database requirements task force report, as you know the there was a RIPE database requirements task force that concluded last year and they issued a report that was also published last year, I think in November and it contained a set of recommendations that are now being worked on in the appropriate Working Groups, and there was also one non‑recommendation ‑‑ no, it was actually not a non‑recommendation, there was no clear consensus in the task force for how the topic of contact information should be handled, so, contact information being published in the RIPE database. If you have a look at the report, there were a list of pros and cons, and, yeah, this falls into the ‑‑ into this Working Group, being the primary contact to law enforcement, so my question to you is, how should we proceed with this topic? Because it's ‑‑ well, the data ‑‑ the task force not reaching a consensus and a recommendation already suggests that this is not an obvious way, that there is no obvious way which one can take, so how should we figure out how to proceed with this?
So this hasn't been sent to the list before. My primary target for today is to make you aware of this, because, as I said, this is probably not going to be a straightforward task; we have to take it to the list anyways, but if there are already some ideas on how we could tackle this problem, then please let me know. No ideas so far. Okay. As I said, we are going to take it to the list anyway. So from my side, that's all on this topic, we can probably start with our two presentations. The first one from Matthias Wichtlhuber on encounter DDoS attacks.
MATTHIAS WICHTLHUBER: Hello, everyone. It's me again. I'm going to give a second talk here, not about hardware measurements about encounter DDoS attacks with comprehensive ACLs learnt from blackholing traffic. This is joint work with appreciated colleagues from DE‑CIX, and from Brandenburg University of technology and part of research project that we did over the last few years.
So, let me motivate this a bit. This is the anti‑abuse group, I don't have to motivate too much that DDoS is a problem, you are probably seeing that every day yourself, and when looking at the operators' toolbox how to counter attacks, you essentially have two tools at the end of the cost and complexity scale on the one hand, you have remote triggered Mcholing, a mechanism to drop traffic up‑streams. Simple and cheap. Unfortunately not not very effective in many cases due to different problems with Director of Public Prosecutions. It's working on the target IP and if it works at all it takes the target off‑line so attackers reach their goal.
On the other hand you have full‑blown DDoS mitigation solutions either in your network or somehow rerouting to some scrubbing centre, usually costly and complex to set up, they are highly effective because looking at the payload and managing to keep the target online. In the alternative, not often used, because it's not entirely easy to get it done right. So access control lists. They are simple and cheap because you can run them on your own box, effective if done right because you can investigate the layer 3 and 4 headers, and it keeps the target online. The only thing is, what do I fill into these ACLs? What we were really missing is a comprehensive list of ACLs that are covering the most relevant and comprehensive means multiple 100 ACLs that cover the most relevant vectors.
So, you don't want to compile that by hand, so we thought about whether we could do this in automatic fashion, and the good thing is, IXPs have a very good visibility of remote triggered blackholing traffic because we have seen all that inter‑domain traffic and when we are looking at our auth service, quite a lot, and obviously the networks that have announced these black holes are signalling they don't want that traffic, so it's a no brain they are a this might be a good source for like trying to automatically derive ACLs out of the traffic. So, the idea here is, we are collecting sample flow data, for instance IPFIX but it could be anything else, of remote triggered blackholing traffic, we process the data and then we are applying data mining algorithms and ‑‑ packet headers sent to blackhole via IXP customers and as a service for the community we are announcing we have published these on GitHub for you to use and we hope that you might find them useful.
So let's get a bit into the details how we are doing that. So, the first step that we need to do with that data once it falls out of the routers is we need to do some pre‑processing, because blackholing flows are really under‑represented in the overall flow export, so when we were investigating this, when we first looked at the dataset and we found that the share of blackholing flows in the flow data that we are looking at is much smaller than 1% of the overall traffic and if you have ever worked with like machine learning or data analysis you know that such an unbalanced dataset is really problematic for analysis. Also it's not desirable to have this large share of non‑blackholing flows because you don't want to.store anything that doesn't need to be stored, right? So what we do here is, we are introducing a balancing procedure and we balance by sub‑sampling the non‑blackholing flows so we get about equal share of non‑blackholing and blackholing flows in the data set, and that has the nice side effect we can immediately throw away 99% of the overall data, which leaves us with a very small dataset that is very nice to analyse and also what we can do immediately is throw any personal data away like IPs or whatever we find in there, so there's no personal data involved any more here because we are only investigating header data besides personal things like IP addresses.
This results in the following plot here for the dataset that I have done here. What this essentially shows you is that on the X Axis you have blackholing flows per unique IP per minute, assuming that a non‑blackholing flow is ‑‑ the data set is nicely aligning at the angle by sector so you can see here the data set is very well balanced after this procedure and ready for analysis.
So, now I need to do a quick detour into data mining, I am sorry but it's required to understand the remainder of the talk. So we are using algorithms from e‑commerce here, association rule mining and I am showing you one example here. Let's assume Brian and Markus are going shopping online, names are coincidental, and they are shopping here and each of them have three items in their shopping cart, right, so they are buying an obscenely large, wall mount and drilling machine, with the exception that Markus is not buying the drilling machine, what e‑commerce websites are doing at this point analysing all the shopping carts, obviously a lot more than two, this is just an example here, and they are trying to find out what to recommend you need to buy next, every one of has seen that, every web store store and Amazon is doing that and based on statistics over all of these shopping carts, and one thing here a few examples are listed here, so for instance if I'm going shopping online and I have the large TV in my shopping cart, based on this example here you will probably recommend me wall mount because you have seen this in 100% of all of the baskets.
Same as if I have a large TV and drilling machine probably also recommend me wall mount because you have seen this in 100% of all baskets. If I only have the drilling machine in my shopping cart, and you are thinking about recommending me large TV you would probably not do that, right? Because if I have only seen in 50% of the shopping carts and obviously there's no real relation between drilling machines and large TVs. Rules like these are called association rules and they can be mined very efficiently from large data sets so there has been 20 years of research into that. Obviously driven by a lot of commercial interest. And simply a way to identify clusters of co‑occurring items in the data.
So, what we are doing is, we are taking this algorithms and applying them to our traffic data. So, we are viewing the header fields as items in the shopping cart and we are trying to find things that often co‑occur together, header information that often, preferably with the label blackhole, how does that look in detail? So we are looking at the headers with an algorithm called FP growth that is implementing this method and the question is, that we are asking the algorithm essentially here: Which header information often co‑occurs with blackhole? And you get rules like this out, for ‑‑ frequently used for DDoS with a packet size between 1,000 and 400 and 1500 bytes and this often co‑occurs with the blackholing label in our data so this is a candidate for ACL in our results.
And also what is really nice about this algorithm, it gives you a couple of metrics that allows you to assess the quality of the findings of the algorithm. So, one thing, the left part of this association rule here is called the antiseed ant and the algorithm gives you account of how it was found in training data so the more often you find this, obviously the more relevant this is an attack Vector and the other part is confidence so how often did the antecedent appear together with the consequent as a share, so this gives you an assessment of the quality of the classification so 1.0 would mean here 100% confidence, I have seen this header combination always together, and the lower it gets, of course, the less sure you can be that both are somehow related.
What else did look as a result, so this is also what ‑‑ we are Open Sourcing here, you get a nice JSON list of possible ACLs and you get exactly the information that I have shown you before, you get a protocol in Jan in a code, source port, packet size and which we found to be a good feature as well in addition and confidence and ant seedance support to give you an assessment of how good this rule actually works.
So let me quickly go over an analysis of the generated ACLs. So on the X Axis here you are seeing the services we are covering so we are simply looking up the source port and translating it to a protocol name. The first thing that sticks out is DNS here on the left side, that's of course because filtering DNS is really hard, it's a legitimate protocol, but often also used for DDoS, so the algorithm generates a lot of very specific ACLs because it then starts to look not only at the source port but also at attack services, certain packet sizes and generates a lot of ACLs for DNS. And if you are looking at the other services that are covered here, it immediately turns out you find a lot of the usual suspects like SNMP, SSDP, all the new vectors that are not that well known in the community so far, the web service discovery protocol or here this is the HCP discovered ‑‑ used by I will filtrated cameras, this is also only the top ten that we have generated so there's a lot more in the long tail.
The other thing I wanted to show you here is the confidence distribution across all the ACLs. So, what you see on the X Axis here is the confidence, so we are cutting off the list at confidence of 0.9, so, anything that has 90% or higher probability to be routed into a black hole at the IXP is included in the list, but you see here that, in fact, 80% of the ACLs that we found have a confidence of 96% or higher to go to the blackhole.
And I guess ‑‑ yeah. How can you use this? I think there are a lot of ways to use this. That's why we thought it might be a good idea to Open Source it. So you can find this list here together with a bit of additional information on the format on our GitHub account, and you can use it for converting it to a suitable conflict format for your network, so you can directly generate ACLs out of that. You can use it for blocking, for monitoring, because usually you can also touch counters to ACLs so you could simply use it to investigate what's going on during an attack, and yeah, my recommendation would be deploy it on your router as a list so you have it ready when you need it and then simply attach to the next prefix that has a problem, right? And it's an additional escalatory step that you can use before you blackhole traffic or go to scrubbing centre.
So, that's all. Thank you for your attention. And feel free to ask questions.
MARKUS DU BRUN: So are there any questions?
SPEAKER: Actually two questions. I guess the first is, does the model you are using work fast enough to be a redaction to a DDoS attack or is it pre‑emptive only?
MATTHIAS WICHTLHUBER: Can you repeat the question, I didn't get it.
SPEAKER: Does the machine learning model which generates the ACL work fast enough to react during ongoing DDoS attack? Approximate
MATTHIAS WICHTLHUBER: This is not an online model. The idea was we generate these are covering 300 types of DDoS attacks now, for you to have it ready when you need it, right? I guess you could apply it on‑line but it would be a lot more work to do all the stuff around it, and also we are not Open Sourcing the model, that's something I have to state clearly; we are just opening sourcing the results of the model because there is, you know, GDPR compliance and private data, we don't want to do that.
SPEAKER: Stephen from AMS‑IX. I think it's very interesting, but so you say you Open Source or publish the results. Is it also something that you update regularly?
MATTHIAS WICHTLHUBER: I think we are going to update this, I cannot tell you what the frequency will be, it depends a bit whether I see it's used by the community or not or whether anybody finds it interesting or usable. Also, I don't think that this will edge out too fast because it's mainly ampication, a lot of stuff that is in there is well‑known, like the protocols that are covered and what might happen is that you see occasionally new vectors that are not in there, like I can imagine that we maybe do a quarterly or half year update and that should should solve the problems pretty well.
SPEAKER: It would be interesting to see if you run this exercise again how much difference there is ‑‑ it would be interesting to see that.
MATTHIAS WICHTLHUBER: I think it's a nice solution for solving most of the problems but for sure it won't solve all of the problems, right.
SPEAKER: Sorry, one more. I'm just curious if maybe, you know, with IPv6, does the less fragmented space make it easier to do this kind of ACL pushing or does the increased space make it harder?
MATTHIAS WICHTLHUBER: Well, this is not relying on IP addresses at all, so the question whether it's IPv4 or IPv6 doesn't really matter, we are storing this information away anyway and the first step after flow export. So we are really only looking at the header data and IPs don't play a role in that.
ANNA WILSON: Any more questions? Is any more questions? I am not seeing anyway.
MARKUS DU BRUN: Thank you, Matthias.
MARKUS DU BRUN: I think this is very interesting stuff and I will definitely have a look into it. We have one more presentation, this is going to be online in Meetecho from Jeroen Leendertz, he is presenting on proactive blocking.
JEROEN LEENDERTZ: Hello.
ANNA WILSON: We hear you now.
JEROEN LEENDERTZ: Yes, now I can hear you. I am online, I would like to share my findings about reducing unwanted/bad traffic to permanent.
I started this journey a couple of years ago and basically when you look at log files in service and you will see that there are a lot of requests made by basically the whole world and at least half of those requests is not needed for the end customer, so, what I usually do, is I talk to customers and ask what their target audience is, and I have a couple of examples that I am going to show in the slides and they are based on two different websites, two different ‑ with two different ‑‑ the basics is they are based in the Netherlands and in Belgium so their target audience is pretty local and when you look at all the data, the web servers receive you see that that he is a lot of abuse going on, a lot of requests are made to find out stuff about how you are using your website and how they can abuse it. A very simple how can I move this number, I don't have ‑‑ I don't have any options. Can someone help me?
MARKUS DU BRUN: Do you want me to click through your slides or ‑‑
JEROEN LEENDERTZ: I am looking for the options to do it myself but I don't see where the options are. So please ‑‑ okay. ‑‑ to basically protect the end customers' data and customers' data and the biggest benefits that I have seen is that you basically don't need any Captcha security any more for web forms etc. Less overall power consumption. I discussed this also with hosting companies.
MARKUS DU BRUN: Seems to have dropped out completely.
ANNA WILSON: Let's give a few minutes and see if we can get him back. I think I see him still on the call as a participant.
MARKUS DU BRUN: ‑ said Meetecho crashed on her side as well. Anyone else experienced this?
ANNA WILSON: Can you hear us? ? We hear you now.
JEROEN LEENDERTZ: I will choose to change the slides myself now, so it look better. I was able to still hear you guys but you were not able to hear me. Hello?.
ANNA WILSON: Yeah, we can hear you, go ahead at your leisure.
JEROEN LEENDERTZ: Let me start off with these slides. Basically made some block lists and those are so effectively that I want to share my findings, and the main benefits are that Captcha basically are not needed any more. The power consumption of the total solution and the total actions after the solutions will be less. Let me give you a simple example: When you ‑‑ you don't have to use Captcha in your phone there are no connections made to the Google services or whatever services you use. When you turn off the service and you don't use anything at all, then a lot of power will be used by forms that will be created automatically by BOTs and I have some examples of that as well. There is one more improvement, but in my opinion, the best of all is that you give way less opportunities for hackers and you can basically eliminate them from your on‑line services.
So here, I have a pretty simple example of submission forms of one of the websites. I asked the owner if I was able to share the data from his website, yes, since one week ago I heard that that I was going to give this presentation and accidentally the day after that, my block list got deactivated on a website and I forgot to do ‑‑ so that gives a pretty good example of what's happening.
The actual submissions that were done, that was good, I filtered them out a little bit and below you see five submissions that were done by BOTs, so as you can see, when the service was off and Tuesday morning I turned the service back on and you see after that submissions. Then I go down to the list, you will also see that the blacklist, blog list got deactivated at May 13th and it is full with automated form submissions made by BOTs. When you look before, there are literally no false submissions at all, and how did I get there? I started with website and basically when you look at blocks and you can either use plug‑ins for that, you will see that a lot of requests are made by non‑targeted users for the end person and when you look at where the requests are coming from, then more than 99% is coming from data centre addresses, so how do I start? You see submission is being done by BOT, we check the IP address and you check the responsible organisation and, from there, you can basically see is the responsible organisation a data centre, yes or no, and why would that data centre have to connect to a website of a local gardener? I don't see anything useful there for the end customer. So let's go to the next slide. Here is the second website, and as we can see here, I took the serve log, I analysed the log and this is the access log and this website has been online on this server since March '22, and we have like 1.4 million web server requests for the files, etc. When I take the ‑‑ the next one. Oh, this is double. I have one ‑‑ okay, the other block I wanted to show you is basically the why basically I made it in this way ‑‑ when you have a web server you can use the host files to easily implement block list, whatever kind you would like used and by doing it that way you can allow ‑‑ without doing anything because you can do it on the location marks or the server marks, wherever you really need to implement the block so it's only part of the website that is being blocked and the rest is being served as normal. And the last slide I have was basically going to show you all the blocks that have been made, I have it here so I can have a small look. In the same where we have 1.4 million requests from normal users, we have a little bit over 200,000 requests from data centres that don't bring anything good to the end customer. And I thought I put it in the last slide but certainly it isn't. But when you look at this, you basically see that they are trying to put products in carts, they are trying to scam websites and overload websites and when you don't actively block them, you basically cannot stop them. If the form submission, I was able to relay them back all the way to one version ISP and I don't get that on the abuse list, you don't get even better ‑‑ anything. So, basically what I'm trying to say and what I'm trying to put questions in the room, how can we improve this? Is because I contacted the RIPE service using the chat on the website, with the question: How can I get better data to better decide on what I want to block and what I don't want to block? And basically, for websites if you ask me, human made connections is what you want to allow. Code based connections like scripts or whatever. Other stuff you don't want to allow unless the end customer wants to use their services and think about Google and Facebook, think about pin interest and you can name them and easily add them to the allow list so those services will keep working like ever. The down sides for this is Google of course has, for example, lot of agents which found by everybody and when somebody from outside the Netherlands or Belgium tries to connect them, even if they are a humour user, they will be blocked. Why? Because the end person cannot have any interest in those kind of users and basically, what I would like to ask RIPE and the community is, is it possible that we let the LIRs at more information about what PLSs are used for and basically are they going to be used for how many interaction or for code? With that, I would like to open the discussion and see what the audience thinks about these kind of solutions.
ANNA WILSON: Thank you. Any questions or indeed answers? No, we don't have one left at the microphone.
MARKUS DU BRUN: Any comments or what Jeroen Leendertz is doing or any feedback for him on how to improve his his filtering?
JEROEN LEENDERTZ: Basically 50% of all the traffic is being blocked and not needed, 50%. That is, for small websites. Nobody has any questions, that's nice. I would like to ask, I am ‑‑ I am reading the last bits actively put this online with testing, anybody who is interested contact me by e‑mail for testing proposals.
MARKUS DU BRUN: Perhaps there will be any questions later on, so thank you.
JEROEN LEENDERTZ: You are welcome, thank you.
MARKUS DU BRUN: So, we are still good in time and we don't have anything on the agenda. Is there any other business that you would like to raise or discuss right now? Nothing. Okay. Then, I think we are safe to conclude this Working Group session, thank you all for attending and I will put the open issues on the mailing list. Thank you.
LIVE CAPTIONING BY AOIFE DOWNES, RPR