The Technology Behind Imo Judge

An online judge is an automated grading system for programs, built for programming contests. It compiles and runs the source code submitted by a contestant (hereafter called the user program) and grades based on the results. A programming contest must be run fairly, calmly, and without incident. In an online judge, which has no choice but to perform the extremely dangerous act of executing arbitrary submitted source code, a variety of techniques are employed to meet these requirements. In this article I introduce several of these techniques, using Imo Judge, which I built, as an example.

Server Architecture

Contestant → Web server → Database ← Judge server

Figure 1: The current architecture of Imo Judge (arrows show the direction of connections)

At the ICPC, a judging system called PC², written in Java, is used. TopCoder likewise uses a judging system written in Java. However, running an application on the client side makes it hard to avoid environment-dependent issues (especially around networking), and if a problem does occur, debugging becomes extremely difficult. In particular, if it ends up throwing an exception and terminating, recovery becomes hard. For this reason, Imo Judge interacts with contestants through the browser, which absorbs environment dependencies well.

In Imo Judge, the web server and the judge server are completely separated, with a database placed between them. This lowers the coupling between the two and makes it possible to scale the judge servers. As a result, in every contest we were able to run multiple judge servers and reduce contestants' waiting time, and even when a re-grading occurred, we could handle it smoothly by temporarily increasing the number of judge servers.

During a contest, the judge servers run on Amazon EC2. Although it is done manually, the number of servers can be scaled within a few minutes once instructed. Because cost is billed by the hour, it is extremely cheap. For example, the amount spent on judge servers for the 2011 Kyoto University Programming Contest was

practice servers … 1 server × 24 hours × $0.07/server-hour = $1.68, and
contest servers … 5 servers × 6 hours (including extra time before and after) × $0.07/server-hour = $2.1,

a breakdown that let us run the contest at a very low price of about 300 yen.

Design of the Judging System

Diversifying Execution Methods

In recent years, the Informatics Olympiad has proposed a wide variety of problems, not confined to the batch format (where all input is given and the output is checked).

Batch format (including special judge): The most typical format, in which a fixed input is given in full at the start. Many batch-format problems have a fixed output, but some—such as problems involving floating-point numbers—have an indeterminate output. Grading such problems requires analyzing the produced data with a separate program.
Output-only format: In this format, the input data is given to the contestant in advance, and they run their program locally, create output files, and submit them. It is adopted for problems whose solutions are not fixed, such as NP-hard problems, and it is highly distinctive in that contestants can edit by hand and change the program they use while actually looking at the input and output data.
Reactive format: Whereas the batch format gives all the input data at the start, the reactive format gives input in response to the user program's output. It can handle not only problems that test online algorithms but also problems—like tic-tac-toe—where the next input changes depending on the user program's output. At the 2011 Kyoto University Programming Contest, we used this format to pose a problem testing randomized algorithms.
Encoder/decoder format: A format that has appeared in recent years, in which contestants write two programs that communicate with each other. It is used for problems such as those testing channel capacity.

Attack Techniques Against the Judging System and Their Countermeasures

A judging system must never crash no matter what program it compiles and executes. Therefore, it must monitor user programs and stop execution whenever a situation arises that could endanger the system. The judging system MOE, used in recent International Olympiads in Informatics, monitors user programs with ptrace. However, using ptrace requires deciding how to act on each system call and necessitates intervention, which also incurs a slowdown. To run user programs in as close to their normal state as possible, Imo Judge does not use ptrace; instead, it restricts user programs using ulimit and cgroups, which operate at the kernel level.

Memory-Exhaustion Attacks

An attack that allocates a large amount of heap memory. Heavy swapping can occur, and its effects can destabilize the judge server. Because ulimit cannot limit memory, cgroups is used to limit it. The target of the limit must be Resident Memory, not Virtual Memory. Care is especially needed when running Java: if a limit is placed on Virtual Memory, it may not even start up.

Disk-Exhaustion Attacks

An attack that reduces the free disk space by continuously writing to disk. Because this can cause unexpected errors on the judge server, ulimit is used to impose limits.

Fork-Based Attacks

An attack known as a fork bomb. ulimit is used to limit the number of processes. Because Java spawns multiple processes, imposing a strict process limit prevents Java programs from running. Also, a fork bomb is hard to fully terminate by killing processes one at a time (because new processes are spawned one after another while you are killing them), so we wipe them out using kill -1.

Compile-Error-Based Attacks

C++ templates are Turing-complete without a depth limit, so compilation may not terminate. Worse, it is possible to craft source code that keeps emitting warnings, so the compilation process must be written with proper awareness of such situations. Specifically, a time limit must be set on compilation, and compile-error output must be truncated appropriately.

kill-Based Attacks

If the program monitoring the user program runs with the same privileges as the user program, the user program can successfully kill it. In that case, it is impossible to tell whether it was an attack by the user program or whether the judge program terminated due to a bug. Therefore, the judge program must run under different privileges.

Attacks Using /tmp and /var/tmp

If you send a program that writes a program into /tmp or /var/tmp, and then submit a program that includes those files, you can seemingly achieve short coding. Because the first-generation Imo Judge deleted only /tmp, Professor iwiwi wrote into /var/tmp and the attack succeeded.

The Future of Imo Judge

Going forward, Imo Judge plans to separate the UI and API so that third parties can build UIs while keeping the system safe. We are also considering improving judging speed by placing test cases into the judge queue separately.