This information comes from members of a team that worked with the RTS system. The following is in his words:
RTS As you stated – RTS was a slick design, it was a system that was to run on 339 servers across the country and over 33k phones and at least 26k users logged in to the production system. It was a shame that issues outside the main RTS software denied it the limelight. The visualization and transmission aspects were not part of the RTS system and thus the RTS system comprised of:
mobile phone software – a J2ME application
the web service processing the request – a Servlet running on Glassfish
Memcache to cache data that was not changing
the database – running on Mysql
All of which were based on tried and tested open technologies.
The truth is that around 8PM Monday is that the /var partition on the provisioning server (running CentOS not Windows) got filled and thus the underlying RDBMS failed. It was a shame because there was so much space on that server but not in the correct (needed place). I can state that there was no hacking (nothing points to it). I can also state that RTS was not creating files and thus the partition was not filled by RTS data but rather by Mysql binary logs that were being generated in situ due to database replication which was switch on. Thus this meant that if the provision server went down – no new logins and requests for candidate data for that polling station could not be serviced. However, those individuals who had logged in at least once before in accordance to the procedure were able to send results to the other servers that were up. This explains the “slow down” experienced after the provisioning server went down.