Category: Developer Blog

Developer Blog

Say hello to iCrawl v2

It’s official! After countless hours of tinkering and testing, the new crawler has been officially rolled out!
iCrawl v2 fixes a bug in which networks randomly stop being indexed, and provides an all new scoring system which now allows for network scores higher than 100.
The scoring system, of course, is not complete pending updates based on feedback from an ongoing survey.

How are networks scored? You decide!

are-you-talking-to-me

I recently stumbled upon a website that says you should not trust any IRC statistics unless you know exactly how the metrics are weighted, and that the stats could be misleading or push personal agendas… While IRC-Source has always kept how networks are weighted private, it hasn’t been to push any sort of agenda. We kept the system private in attempt to avoid people abusing the system. We don’t want people to know simply because it would be too easy to abuse if they did.. At the same time I can’t help but to feel like I developed the system more in the vision of what I see as an ideal network rather than what others may feel is an ideal network, in that regard I can’t help but to feel I have failed you all.

I want to make it up to everyone by running a short survey (2 questions + optional feedback) to find out how you guys would like to see networks scored and listed on IRC-Source. By getting your feed back I will be able to develop a system that creates a more ideal view of how networks should be ranked based on your opinion of what makes a good network, rather than just my own.

If you care about how networks are scored and listed, which you should, then please click here to take the survey.

Thank you,
        Robert Whitney, founder IRC-Source

Change Log Q2/2016

Forum updates

  • The forum is now the first thing you see when coming into the website.
  • The forum has received a much needed visual update, and should be easier on the eyes as suggested by loser.
  • New forums have been added, such as Network Introductions, Feedback, and Off Topic. As long as people use these forums I will keep them up there.
  • New posts are announced on the IRC-Source IRC channel.

Crawler Updates

  • iCrawl v2 is ready to go, and will be fully placed into production this week sometime after a server migration.
  • iCrawl v2 fixes a bug where networks randomly stop being indexed.

Website Updates

  • Fixed a bug where sidebar suggested min/max users, but max users were not capped.
  • Moved global statistics to network directory page, may move to it’s own page if things get too cluttered.
  • Home page redirects to forum.

 

New ticket system is online!

In an effort to improve IRC-Source’s end-user experience, a new ticket system is now online and the support forum is cleaned out. The support forum has been changed to the Community forum, where you may ask the community for IRC related help, and post requests for links, merges, etc.

Maintainer access requests should now be made through the Help Center and will be handled in the order that they are received. This will allow request to be handled properly in a timely manner. I’m truly sorry for any inconvenience the old system has caused, and we look forward to offering better, timely support moving forward.

Mail problems solved

For an unknown period of time the email system on the IRC-Source website was not working. We now have a new email provider and emails will be sent out as normal once again.
For accounts awaiting verification, I have gone ahead and manually verified them. If you were waiting on a password reset, or any other communications from the website, please try again.
If you are still not receiving emails please check your spam filter, or join the IRC-Source help channel on irc.AltSociety.co in #IRC-Source.

IRC-Source will breath new life!

When I started this project, I had a vision for what an IRC search engine should be. That vision has not yet been met. Through my adult life I have been struggling with bipolar disorder (extreme mood swings, depression and manic). My disorder has caused me to neglect this, and many other projects that I’ve started throughout my adult life. I have started getting treatment for my disorder, and I feel better than ever before! I feel like my vision can finally be realized.

The future of IRC-Source

I am rebuilding much of the code from the ground up, starting with the crawler bot. I won’t bother wasting your time with the could-be’s and the dreams I hope to achieve. Instead I will tell you about the realistic goals that I have in mind for the upcoming weeks. Last night, only 3 days into my treatment, I started working on the new crawler bot. The current production version of the crawler is an ugly mess written in a combination of PHP, PERL and Bash scripting. The new crawler that I have begun working on is a pure PERL implementation which I hope will be faster, and much more effective. Due to unforeseen circumstances, many of the requests to maintain network information have gone unanswered. A friend of mine tried to help out, but failed to reach out to many of the people who have requested this access. The new crawler, upon doing it’s normal crawls, will reach out to people who have made these requests, as well as future requests during it’s normal crawl operations. In other words, I hope to have this system automated in the very near future.

A new scoring system will be implemented, and it will take into account the following factors when scoring a network:

  • User count
  • Count of indexed channels (channels that have more than 2 people in them)
  • IRC Operator count, as well as the balance between IRC Operators and users.
  • IRCv3 Compliance
  • A small fraction of the score will be based on whether or not certain information about the network is available, which in most cases will require someone to maintain its information.

I would like to avoid giving specific details on exactly how networks are scored, as that would make it easier for people to manipulate the scoring system in their favor. Scores will be based on statistical information collected over the previous 1 week period, and re-calculated after each crawl. I believe this will give a more accurate depiction of each network.

I have recently modified the behavior of the graphs for each network, and instead of showing average counts they now show the max counts for each period. The yearly graphs now show each network’s stats by week instead of the month.

Once the crawler is completed, which should be by the end of 2015, featured networks will be added to the home page. IRC-Source will feature a network for each month, week, day and hour.
If all goes as intended, I plan an update to record statistics for channels meeting a certain criteria. Channels will need to have at least 20 users in them, and visible in the /list at the time of crawl, to be tracked. At this time there are no plans to add maintainers for channels, but this could change in the future. From the list of tracked channels, I hope to add featured channels just as planned for featured networks.

A new ticket system will be put into place to help better assist you when you need help from IRC-Source staff. I have some experience with the system I plan on putting into place. I believe it should be sufficient for making sure that help requests are handled properly, and within a timely manner (unlike in the past). This also means that the forums will be removed from IRC-Source (nobody uses them outside of the help forum anyway).

I know I have not kept up on many of my plans previously, but I feel like I can finally concentrate on this project again. I know that my goals are realistic, and I feel like my vision will be realized and I hope that you will all really enjoy it. Here’s to a better year ahead of us, and thank you to those who have seen this through and stuck by my side throughout my time on this project.

Cheers mates! ~ xnite

Development Log – Week of 2015-05-03

In case you haven’t noticed, I haven’t been keeping up on the development logs. I realize that many of the members here like to be informed of what is new and what is in the works, and I think it’s very important to keep a line of communication open during the development process.

Obstacles

Right now I am performing a big migration to a new VM that I spun up for IRC-Source. A problem that I have run into is that the current crawler is not running efficiently, and I don’t want this problem to  persist on a new server. I had intended on phasing it out in favor of a new crawler anyway, so the sooner the better.

Currently working on…

New crawler bot

The current crawler is too basic, and single threaded, there are too many scripts involved in the process (crawl list builder, crawler bot, parser). The new crawler will be faster, more efficient, and scalable.

Wishful thinking

The new crawler daemon will handle crawling, parsing, and sending data to an API on the website. The new crawler will have the ability to thread out, crawling more networks, faster. The goal is to have the entire indexing process done in under 5 minutes per run, but pretty much anything will be better than current run time.

Not so wishful thinking after all

I’m not far from completing the new crawler, there are a few things to polish off before it’s ready for testing on the production database though. I have opened the floor for beta testing of the new crawler when it is ready. The goal date for beta testing is May 25th, and full on integration by June 30th.

Development Roadmap – Q2:2015

It took me a while to put together this quarter’s Development Roadmap without giving away too many details or secrets about some of the upcoming work in this project.
I can now discuss a little bit about the crawler bot, and something new going on with the rating system. I want you all to know that I am very alive, and this project is still active.

Q1 Setbacks

Last month I got everything stuck into a bit of a migration phase, so I’m still moving some stuff over to a new server. One of the problems that exists on the previous server is that the crawl bot isn’t running as efficiently as it could be, so I’m really pushing to get version 2.x of the crawler done before I complete the migration process once and for all. So on top of keeping up on support requests, keeping up on development and keeping up on maintenance of the servers, things have been moving along slower than I had hoped. Some things expected to be done last quarter were not done, and for that I do apologize.

Q2 Expectations

I have some big plans this quarter, and I think that the important bits are going to work out on schedule this time, given that there is still two months left and quite a bit of the work has been started. Around Quarter 3 I’ll probably be working out the rough edges and starting on the public API.

While I’m working on the new crawler bot I will also need to rework some of the regular expressions and this may cause some of the data parsing to break for some networks. If, after the migration to the new crawler, you notice that data does not look correct please notify me via the support forum so that I can get it fixed.

Short Term Goals

These are goals that are planned to be wrapped up by the end of Q2:2015 (Q2=April 01-June 30): (Items that are crossed out have been completed)

  1. Network Administration Panel
    1. Manage users who have access to administrate networks
    2. Grant users the ability to “claim” networks securely by a process which connects to the network and verifies IRCOp access.
  2. Network Listings
    1. Featured Networks
    2. Categorize & Tag Networks
    3. Browse or Search by categories & tags
    4. Identify which listings are SFW and which ones are NSFW
    5. Split network page to pages?
  3. Improved Scoring System
    1. Removing cap on maximum score
    2. Less data driven, more people driven
  4. Crawler 2.x
    1. Crawler threads out to crawl up to 100 networks at the same time.
      1. Crawling process is expected to be under 15 minutes, where current is about 45-50 minutes.
      2. In the future crawler may attempt to stay connected to the network to avoid flooding connections.
    2. The crawler is being entirely re-written in PERL to interact with a new web API for crawling.
      1. Between the web API and threading multiple servers could be used in the future to index all networks within as little as 2 minutes.
    3. Crawler will daemonize
      1. Crawler will always check if networks need to be indexed and stay on top of the task instead of checking every 30 minutes.
      2. In the future crawler bots could try to always remain connected to avoid flooding connections.
    4. During testing period you may see a test crawl bot 2.x connecting from [email protected], feel free to add an exception or ban the testing host name including the ident.

Long Term Goals

These are the current long term goals that are planned for completion, hopefully, by the end of 2015.

  1. Administration & Moderation Panels (Q2:2015 – Q3:2015)
    1. Allow volunteer staff to help trigger network verification process and provide the tools necessary for them to help get support requests moved along quicker.
      1. Support requests can be done quicker.
      2. Gives members an opportunity to get their foot in the door involving themselves in the process
    2. Need to work out volunteer agreement before I can start accepting volunteers.
  2. Channel Listings
    1. Channel Pages w/ Graphs for channels where there is an established maintainer and contact information is provided.
    2. Featured Channels
    3. Categorize & Tag Channels
    4. Identify which listings are SFW and which ones are NSFW
  3. Public API (Q3:2015?)
    1. View Network Information
    2. View Channel Information
    3. View Public user profile information
  4. Improved crawl bots – (Imported from unfinished tasks in previous quarter, please see Crawler 2.x in short term goals)
    1. Identify Janus linked networks where possible & automatically add Janus flag to network record

Development Delays

My apologies, I have not been very active in development over the past week as I have had a family emergency arise.
I will be out of town this week, and I am sorry for any inconvenience this may cause.
In my free time, however, I have been working on some of the smaller tasks such as working out kinks in the crawler bot, such as the bot not properly detecting when it has failed to connect or accidentally marking it self as having been banned from a network where it has not been.
I will also still be checking support requests while I am out of town, however less frequently than I typically would while at home.

I hope you will all understand, and as always keep checking back for new updates.

Development Log – Week of 2015-02-22

Previous weeks development got off to a very slow start with a few minor mishaps & downtime, but I’m excited to kick this week off by announcing a few new updates:

Firstly, there is now a Support Forum where you can make support requests and request access to maintain networks. This will remain online, along with the Off-Topic forum if anyone actually cares to post there. I have also added paging support to the network browse/search page which should speed up page load times on mobile devices and over all improve the visual experience.

Some of the things that I had anticipated did not get done last week, and are now being moved over to this weeks agenda. Featured networks is going to be a big focus area, as well as continuing drafting out different ideas for an automated verification process for maintainers that provides both convenience and security (two things that don’t typically compliment each other).
Flags and categories is still going to be a huge focus considering that paging is now completed, and their implementation should now be simplified.

I don’t see all of this taking too long, if there is time at the end of the week I will try to fit in some of the cosmetic changes that have been requested before hitting some of the heavier development next week.