Fix a build failure on Travis for an Node application
After taking CS3281/82 this semester, I became quite interested on open source projects. Recently, I was contributing for a famous Chrome extension called Unblock-Youku. This is a famous Chrome extension for helping overseas Chinese to be able to watch China’s major video streaming websites such as Youku, and that’s how its name came from.
Because of the copyright issue, most of those video streaming websites will check if the request comes from a China mainland IP address. Therefore, Unblock-Youku can help to modify HTTP request header or use some proxy server to trick the website server to believe that the user is based in China mainland.
Let’s forget about the legal issue here and focus on the technical part :P
Unblock-Youku runs its tests on Travis CI. Its tests has two parts, one is unit test and another is headless browser tests using PhantomJs. Two week ago, after an upgrade to allow the tests run on Node v4 and Node v5 environment. We noticed that tests suddenly failed on Travis, especially on v5 environment (See the Log). However, the test can pass smoothly on local test environment on my machine, for both Node v4 and Node v5.
I did some investigation on my forked repo and set it up on Travis. And I found that the failure will actually randomly appear, not only on v5, sometimes v4 will also fail the build phase. And sometimes the test will pass without error. In particular, if there is an error, it is always the first PhantomJs test case that cause the failure, which is to try request for a xml file served on localhost server. And all test after that can pass.
One thing from the build log that caught my attention was the test failed immediately when we make the GET request, and after that, there are some lines showed we “connected to 0.0.0.0:8888”. This was interesting because for those success builds, after we make the request, there are immediately some lines showed we “connected to 0.0.0.0:8888” and we will eventually pass the test.
So if a build can pass, the log will be something like this:
But if a build failed, then the log will definitely look like this:
Does that means our local server is actually not ready when we try to connect to it, and hence cause the failure?
After checking the Travis document, I confirmed my assumption. Travis asks users to give Web server some time to setup (things like bind to sockets) before running the headless browser tests. And apparently we didn’t give server enough time to start and directly tried to connect it, that’s why the first test cases will always fail and that log message (showing the server is running) will be after we made the GET request.
Now, what left to do is we will wait for the socket being open for connection before we run the test. I did a simple fix and submitted a PR, and now the tests could pass perfectly on the Travis server.
This problem is actually quite easy to be ignored. Although the test scripts already sleep 3 seconds to wait for server to start, it seems that 3s is still not stable enough to ensure the server is up and running. Fixing the issue helped me to understand Travis and headless browser test better, and I am now able to investigate and solve similar bugs in the future by the experience gained in this case.