This information comes from members of a team that worked with the RTS system. The following is in his words:
RTS As you stated – RTS was a slick design, it was a system that was to run on 339 servers across the country and over 33k phones and at least 26k users logged in to the production system. It was a shame that issues outside the main RTS software denied it the limelight. The visualization and transmission aspects were not part of the RTS system and thus the RTS system comprised of:
mobile phone software – a J2ME application
the web service processing the request – a Servlet running on Glassfish
Memcache to cache data that was not changing
the database – running on Mysql
All of which were based on tried and tested open technologies.
The Failure
The truth is that around 8PM Monday is that the /var partition on the provisioning server (running CentOS not Windows) got filled and thus the underlying RDBMS failed. It was a shame because there was so much space on that server but not in the correct (needed place). I can state that there was no hacking (nothing points to it). I can also state that RTS was not creating files and thus the partition was not filled by RTS data but rather by Mysql binary logs that were being generated in situ due to database replication which was switch on. Thus this meant that if the provision server went down – no new logins and requests for candidate data for that polling station could not be serviced. However, those individuals who had logged in at least once before in accordance to the procedure were able to send results to the other servers that were up. This explains the “slow down” experienced after the provisioning server went down.
The following are some figures of the success the system achieved when it was up:
16,617 polling stations had reported the presidential race,
8,130 polling stations had reported the governor
8,229 polling stations had reported the senator race,
11,020 polling stations had reported the national assembly race
8,613 polling stations had reported the women rep race
9,228 polling stations had reported the county assembly ward reps.
It was a shame that the technical team could not deduce that the issue that took down the server was a simple problem of disk space because the team spent time searching for the problem elsewhere (concurrency, number of threads being spun, max number of connections on mysql). What makes the team sad is that RTS missed it’s the golden window to shine due to non programming errors and the time wasted before we were able to detect that trivial issue for that matter.
Full Article: IEBC Tech Kenya — A Clear Definition of the IEBC Tech Failure.