IRC-Source Crawler 3.x is currently successfully updating InspIRCd based networks. We will be testing the crawler on various IRCd’s once we have tested the stability of the new crawler. The crawler is designed to scale up quickly with it’s Master/Slave system, so we are planning on using 2 crawl servers at first to see how things go. The new crawler, and scaling, is necessary because the current crawler uses all CPU and memory when crawling which could lead to some of the errors that we have been having with indexing certain networks.
The first ever recorded line of history in the IRC-Source database is 2014-05-27 23:57:48 UTC-6. We consider this date to be the birthday of IRC-Source. In recognition of IRC-Source turning 3, here is a brief history of IRC-Source, and some of our goals.
Last night we tried to only crawl networks once per hour and wound up with missing crawler data. Usually we crawl at the beginning of the hour, then again 30 minutes later to crawl missed networks.
Due to a slight time difference between the crawl server and the database server some networks were being indexed at the end of the hour.
We crawl networks backwards from the last time each time. Meaning if Freenode was first one hour, then it will be last the next, so on and so forth. This caused data on large networks to be crawled at the end of the hour every other hour.
Conclusion / Fix
We have implemented an ntp client daemon on both servers, but just in case of future time desyncs we start indexing 1 minute after the hour. We will be monitoring the situation, and if this becomes a problem we will set it back.
We have launched a feedback forum on the main site. Your feedback is very valuable to us to steer where IRC-Source is going in the future.
With your feedback we hope to improve IRC-Source in a way that everyone will enjoy.
<3 IRC-Source Staff
A fix for the forums has been implemented which now allows for any member to post to the forums.
The forums activity has been pretty low recently, we could really use your help to get things going even if you just come on to post an introduction for yourself and/or your network. Every last post helps get our community moving along.
New graphs have been added, and some changed to display information a little bit better.
Into the future
We are currently working on an API for IRC-Source, and will release more details once it’s finished.
The forums and moderation panels still need some work, and I’ve mostly been working on the backend (Admin Panel) lately. Unfortunately our advertiser (Google Adsense) has laid out some ground rules for us to abide by in order to keep ads up, which currently fund the site. If you have even $1 a month to spare, please become a patron.
Goals made, promises kept
After making our $20 goal we made the promise to split the crawler up onto multiple servers to reduce crawl load. We also made the promise to enable IPv6 support on the crawl bot, which we have done. We also got a new crawl server, check exempts for up to date exempt blocks.
Unfortunately due to restrictions by our ads provider,the only funding this website currently has, new policies had to be put into place to avoid losing that funding.
Without ads this site can’t stay alive, and to be quite honest it doesn’t even make enough to pay the $120/mo server bills. So to ensure the survival of IRC-Source new content policies had to be put into place and can be found here.
To be fully transparent here, we don’t really want to do this but we have to. So here is how we have/are going to handle this issue:
- Our new content policies only align with Google’s content policies for Adsense. We have not added anything extra.
- Channels that violate the content policies will be hidden from our website, but we will keep networks out of it.
- If a network is absolutely dedicated to something that violates our content policies, such as networks dedicated to content of adult nature (sex, porn, etc.) then the network may become suspended as a last resort measure.
- We will do everything we possibly can to keep an entire network from being hidden from the site but in some cases it may be unavoidable.
- We will continue to work with Google to take down pages that violate their content policies.
- We will continue to try to raise funding for IRC-Source through Patreon, where Patrons receive benefits such as an ad-free experience and special user rank on the forums/profile.
- Once $150/mo is earned through Patreon we will disable ads for every visitor site-wide. IRC-Source is non-profit, we are just interested in paying the server bills.
- A new report button will be put into place to help let us know where there might be content in violation of the new policies.
Once again, we hate to do this but we do not have a choice. It’s either this or we lose funding for the website. Alternatively we could find an ad provider that doesn’t require us to censor our content, but unfortunately those advertisers tend to have low quality advertisements that are not suitable for IRC-Source.
♥ IRC-Source Staff
A complaint was received 3 days ago from our ad provider, Google Adsense. Fortunately we caught the issue on the last day we had to fix it, and were able to get it fixed in the short amount of time that we had. We have taken measures to block channels and networks that are in violation from these policies until we can work out better funding and remove advertisements all together.
We’re not a fan of all of the new policies put in place, but unfortunately to stay afloat we must comply with all guidelines outlined by our provider.
To be fully transparent here is the full complaint from Google:
Action is required to bring your AdSense account into compliance with our AdSense program policies. Please make changes to the above site within three working days. We may spot-check the site again after this time. Once you’ve made the changes, hit the “Mark resolved” button below and complete the short form. Please be aware that if you do not make changes to bring your site into compliance, ad serving may be disabled to the website listed above. Learn more
Example URL: https://irc-source.com/channel/Rizon/%23far-public
Google ads may not be displayed on adult or mature content. This includes displaying ads on pages that provide links for or drive traffic to adult or mature sites.
For more information about keeping your content family-safe, you can: review our program guidelines. watch our animated video. watch the recording of previous adult content hangout. review these tips from the policy team.
As everyone should know by now, IRC-Source has gotten an entire revamp. This meant revamping the forums too.
Right now the forums are still in a testing phase, and some features are missing or don’t work at all, but it’s coming along very nicely.
New forums meant that we needed new & active staff members to help manage them.
I would like to welcome iota & Glolol to the team! I’m happy to say that I can trust these two people as Admins, and think they will make great additions to the team. They are available to help you on IRC and the Support forum if you need it, and are active when sometimes I cannot be. They should be able to help you with most basic operations, and if not can get ahold of me to do so.
How can I become staff??
First don’t ask for it because that’s the #1 way NOT to get it. Second, be an active and respectable member of the community. Post on the forums and be respectful. If one of the current staff recommends you for a moderator position, then I or another admin will take it into consideration and approach you. DO NOT APPROACH STAFF FOR A MODERATOR RANK!
On January 8th the crawler was unable to connect to some networks because it was blacklisted by EFnet RBL. This also may have lead to being unable to connect to some, primarily InspIRCd based, networks for several days after. Upon discovery of these facts I contacted the EFnet RBL team about the issue, asking why we were blacklisted and how it could be prevented going forward, even though the IP had already appeared to be removed from the blacklist. As it would have it, we weren’t actually blacklisted at all, here’s what Gavin at EFnetRBL had to say:
Unfortunately, the domain expired for a day or so, and godaddy’s wildcard dns caused some servers to incorrectly ban users.
There’s a couple ways to protect against this
1) bopm config has a “ban_unknown” option. a proper config should have this set to “no”, and specify which responses should be banned (ie – 127.0.0.*). only servers that had ban_unknown=yes would have banned based on the wildcard.
2) whichever servers you connect to, request kline/dline exempt for your ip. this way even if your bot gets banned for whatever reason, it can still connect.
Please let this incident serve as a reminder to exempt our crawler hostname in your IRCd configurations.