Why competitive programmers should join ISUCON for tremendous growth
ISUCON
is a programming contest about making web applications faster.
Competitive programming is the best possible material for learning algorithms, but outside of research positions it is often hard to see it as a directly applicable skill.
ISUCON is packed with the foundational knowledge needed in the web industry, and I wrote this article believing that if many competitive programmers join ISUCON, both the competitive programming community and the web industry can leap forward.
I also hope that if competitive programmers win prizes, the latent value of competitive programming will be recognized and its visibility will rise.
When it comes to bottlenecks in web development, anything that can be solved with knowledge has, in most cases, already been solved by well-established software if you use standard tools; instead, many problems arise because time complexity was not considered—for example, a full scan is used where adding an index would make it fast—so I believe the experience gained in competitive programming can be put to good use.
Introduction
This article mainly explores how to win at ISUCON using C++.
Because few people use C++ in the Japanese web industry, it is not provided as a reference implementation at ISUCON.
However, as its wide use in competitive programming shows, C++ is the fastest language.
In fact, Google develops web applications in C++ and keeps CPU costs low.
To show that, once you set up the environment and get used to it, C++ lets you optimize efficiently in web application development without sacrificing development speed,
Team Anago entered ISUCON6 with a custom C++ server and successfully advanced to the finals.
Being able to wield C++ freely is one of the strengths of competitive programmers, so in this article, based on the experience gained through ISUCON6, I introduce how to take on ISUCON efficiently in a language that is not provided as a reference implementation.
Overview
If you could rewrite the entire web application in C++, victory would be guaranteed, but unfortunately that is not realistic. So we design the C++ server so that, in its initial state, it simply forwards the responses of the reference implementation's web server as-is, enabling incremental development. There are broadly two approaches to incremental development.
- Filter style … You replace the heavy processing parts of the web application with a special string (for example, <!--hoge-->), process that part quickly in the C++ server, and substitute it back. It has the advantage of not requiring deep knowledge of the reference implementation language, but the disadvantage that you must rewrite in fairly large units. This is mainly what we used in ISUCON6.
- Callback style … The web application calls the custom server. It has the advantage of allowing fine-grained, incremental development at the function level, but the disadvantages that you must implement callback handling for the reference implementation language and that latency tends to grow.
Browser
→
Web server
→
Database
Figure 1: A typical web service architecture
Browser
→
Proxy server
→
C++ server
→
Web server
→
Database
Figure 2: The architecture used in ISUCON6 (red indicates newly added parts)
Figure 1 is the typical architecture of a web application given at ISUCON, and Figure 2 is the architecture we used in ISUCON6. In the Figure 2 architecture we use nginx as the proxy server, which makes it easy to supplement features the C++ server lacks. For example, in the ISUCON6 finals a problem involving streams was posed; since it was unimplemented in the C++ server, we configured the proxy server to request the web server directly when such requests came in. The proxy server can also return static files directly, which prevents the C++ server from wastefully consuming threads.
If you let the C++ server access the database in addition to requesting the web server, you can drastically reduce latency and access data casually. Also, if you preload the database contents at startup, it becomes easy to keep persistence while still using a cache.
Strategies
Here, based on our experience taking on ISUCON6 with a C++ server, I introduce strategies we will keep using in future events and strategies we plan to adopt from next time.
Build a portable environment using Docker
At ISUCON, the OS and software versions are not announced in advance, and setting up the C++ server took a lot of time. We plan to use Docker so that, as long as we are on a Linux environment, we can build an environment that always works as expected simply by running a script prepared in advance.
Docker has some overhead, so to minimize it we disable the following isolation features.
- Disabling network isolation … Network isolation makes the Docker process consume a lot of CPU for forwarding, so we configure it to use the host network.
- Mounting some file systems … The file systems Docker provides automatically are slow, so we mount as needed. For example, nginx temporarily writes a file when it receives a POST request, and if this write destination is on the Docker file system it can be slow.
- (Optional) Disabling process isolation … On some OSes, if you disable network isolation without also disabling process isolation, you may get permission-related errors, so we optionally disable process isolation as well.
Running commands like apt on the images Docker provides by default can take time because the repository is located in the US. We recommend configuring a domestic repository. Building the Docker image in advance so that you only need to download it may enable even further speedups.
Create an environment that is easy to develop in as a team using GitHub and bazel
Team development is done via GitHub. If you develop with a policy that avoids having to edit the same file as much as possible (for example, a design where a handler is registered simply by linking it), you reduce the frequency of conflicts in git and save the trouble of merging. In competitive programming contests it is often unnecessary because everything fits in a single file, but writing across multiple files makes a build tool indispensable, and bazel has very intuitive and convenient syntax compared to other well-known tools (such as CMake).
Prepare libraries commonly used in the web
To avoid falling behind web-oriented languages, it is convenient to be able to use libraries like the following.
- HTTP connection library … You need to be able to make HTTP connections to the reference implementation. Since something mimicking an external service may be given and speak HTTP2, if you have time it may be good to be able to connect over HTTP2 as well.
- Database libraries … Connecting to MySQL is a must. Problems using Redis, PostgreSQL, MongoDB, memcached, and so on are possible, so if you have time it may be wise to prepare them.
- String libraries … Being able to process strings easily gives you room to breathe during development. Regular expressions are not essential since they can be substituted by other means, but they appear frequently in web development, so it may be good to be able to use them.
- Encoding libraries … BASE64 encoding, URL encoding, JSON encoding, and the like appear frequently in web development.
- Hashing libraries … MD5, SHA1, SHA256, and the like are also commonly used.
- Date function libraries … You may need to process dates given in HTTP headers or dates stored in MySQL.
Learn how to create database indexes
Since it is hard to abandon the database entirely, at least minimal database tuning is necessary. SQL servers are quite smart, and if you set up indexes appropriately they are used appropriately. I think you can judge, from the experience cultivated in competitive programming, what to sort by so that reads are fast. Check the queries being used (in addition to reading the code, checking the slow query log or running EXPLAIN makes it more certain) and set appropriate indexes. It is convenient to prepare software that lets you handle the database from a GUI, such as phpMyAdmin.
Use nginx for log collection, serving static files, and forwarding requests
Since we use nginx as the proxy server, you need to be able to do basic configuration. Below is a basic forwarding configuration. It is good to look up other forwarding methods and how to terminate HTTPS in advance. Because we want to quickly check what kind of requests are coming in, we set up nginx to collect logs. Being able to return static files with nginx too reduces the load on the backend. nginx sometimes fails to start properly because it tries to read a file specified at compile time on startup, and it may conflict with the nginx included in the given server image, so it is a good idea to rebuild it from source or confine it in Docker.
worker_processes auto;
events {
worker_connections 8096;
multi_accept on;
use epoll;
}
http {
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 15;
upstream app {
server 127.0.0.1:8000;
keepalive 16;
}
server {
listen 80;
location / {
proxy_pass http://app;
proxy_redirect off;
proxy_set_header X-Real-IP $remote_addr;
proxy_buffering off;
proxy_buffer_size 128k;
proxy_buffers 100 128k;
proxy_http_version 1.1;
proxy_set_header Connection "";
}
}
}
Configure auto-start using an init.d script
At ISUCON you must configure the server program to start automatically. Configuring it with systemd or supervisord is the modern approach, but since you may practice with images that are a few years old, we start it with init.d so that it also works on somewhat older versions of CentOS or Ubuntu.
Configure Linux kernel parameters
The initial state is not tuned for web services, so port exhaustion and the like often occur. You may think of the following as a good-luck charm and just run it.
cat <<'EOM' > /etc/sysctl.conf
net.core.netdev_max_backlog=32768
net.core.rmem_max = 16777216
net.core.somaxconn=32768
net.core.wmem_max = 16777216
net.ipv4.ip_local_port_range= 10000 65535
net.ipv4.tcp_fin_timeout=10
net.ipv4.tcp_max_syn_backlog=32768
net.ipv4.tcp_rmem = 4096 349520 16777216
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_rfc1337=1
net.ipv4.tcp_keepalive_probes=5
net.ipv4.tcp_slow_start_after_idle=0
net.core.somaxconn=65535
EOM
sysctl -p
Other
Because ISUCON is entered by multiple people, communication within the team is extremely important. Sharing Slack or Google Docs in advance is convenient for various things, such as summarizing the outline of the problem or sharing commands. Small touches also matter within the limited contest time—for example, writing the server IP addresses into /etc/hosts at the start of the contest so that exchanging addresses is easier—and since you often need to call out to each other when running the benchmark, we also recommend doing team practice before the contest.
In closing
We plan to release an environment template using a C++ server with plenty of time to spare before the ISUCON7 qualifier. I look forward to seeing all you competitive programmers participate and win prizes.