Server architecture

The more lines we have on the server side, the slower the server is…

There is currently 13 038 474 URLs in the system. With this data size, it’s taking only 726 milliseconds to provide workload to a crawler which is asking for more work, but it’s taking 47 seconds to commit the results… It was taking one second when the table was 2B lines big, 10 seconds when it was 5B lines, etc. The bigger the able is, the slower the results are.

As a result, I worked more actively on the new architecture. The data retrieval is still taking few milliseconds and should be a bit faster, but the data commit is now taking less than 2 seconds to commit the data in a 40B table! It was taking 2s with a 1B table, it’s still taking 2s with a 40B table and this time will be constant. There will be no degradation of the response time even if the load is increasing. The final goal will be to reduce the data commitment to less than a second.

So based on this statement, there is no much modifications I have done on the client side. You can still download and run the current version available on the download section.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>