In case you haven’t noticed, I haven’t been keeping up on the development logs. I realize that many of the members here like to be informed of what is new and what is in the works, and I think it’s very important to keep a line of communication open during the development process.
Right now I am performing a big migration to a new VM that I spun up for IRC-Source. A problem that I have run into is that the current crawler is not running efficiently, and I don’t want this problem to persist on a new server. I had intended on phasing it out in favor of a new crawler anyway, so the sooner the better.
Currently working on…
New crawler bot
The current crawler is too basic, and single threaded, there are too many scripts involved in the process (crawl list builder, crawler bot, parser). The new crawler will be faster, more efficient, and scalable.
The new crawler daemon will handle crawling, parsing, and sending data to an API on the website. The new crawler will have the ability to thread out, crawling more networks, faster. The goal is to have the entire indexing process done in under 5 minutes per run, but pretty much anything will be better than current run time.
Not so wishful thinking after all
I’m not far from completing the new crawler, there are a few things to polish off before it’s ready for testing on the production database though. I have opened the floor for beta testing of the new crawler when it is ready. The goal date for beta testing is May 25th, and full on integration by June 30th.