HttpClient Executors

Prof. Cay Horstmann

Abstract:
Java 11 added the HttpClient to give us a better way to send HTTP requests. It supports asynchronous and synchronous mode. HTTP2 comes out of the box. The threading is a bit funky though and Professor Cay Horstmann explores how things work underneath the covers.

Welcome to the 271st edition of The Java(tm) Specialists' Newsletter. We have a guest author this month, Professor Cay Horstmann of Core Java fame. His article is based on some experiments that we did at JCrete, but the code has been almost completely rewritten. Kind regards - Heinz.

At JCrete 2019, Heinz Kabutz led a session that showed a mystery about configuring the thread pool for the HttpClient class. Setting a new executor didn't have the desired effect. It turns out that the implementation has changed (and perhaps not for the better), and the documentation is lagging. If you plan to use HttpClient asynchronously, you really want to pay attention to this. As a bonus, there are a few more useful tidbits about using HttpClient effectively.

javaspecialists.teachable.com: Please visit our new self-study course catalog to see how you can upskill your Java knowledge.

HttpClient Executors

The HttpClient was an incubator feature in Java 9 and has been, in its final form, a part of the Java API as of Java 11. It provides a more pleasant API than the classic HttpURLConnection class, has a nice asynchronous interface, and works with HTTP/2. This article deals with the asynchronous interface.

Suppose you want to read a web page and then process the body once it arrives. First make a HttpClient object:

 HttpClient client = HttpClient.newBuilder() // Redirect except https to http .followRedirects(HttpClient.Redirect.NORMAL) .build();

Then make a request:

 HttpRequest request = HttpRequest.newBuilder() .uri(new URI("http://horstmann.com")) .GET() .build();

Now get the response and process it, by adding to the completable future that the sendAsync method returns:

 client.sendAsync(request, HttpResponse.BodyHandlers.ofString()) .thenAccept(response -> ...);

The sendAsync method uses non-blocking I/O to get the data. When the data is available, it is passed on to a callback for processing. The HttpClient makes use of the standard CompletableFuture interface. The function that was passed to thenAccept is called when the data is ready.

In which thread? Of course not in the thread that has called client.sendAsync. That thread has moved on to do other things.

The HttpClient.Builder class has a method executor:

 ExecutorService executor1 = Executors.newCachedThreadPool(); HttpClient client = HttpClient.newBuilder() .executor(executor1) .followRedirects(HttpClient.Redirect.NORMAL) .build();

According to the JDK 11 docs, this "sets the executor to be used for asynchronous and dependent tasks".

Heinz's Image Grabber Mystery

At JCrete 2019, Heinz Kabutz demonstrated a program that grabbed Dilbert comics of the day, going to URLs of the form https://dilbert.com/strip/2019-08-21, finding the image URLs inside, and then loading the images.

Here is a slight simplification of the code.

This file ImageInfo.java has a class ImageInfo that holds the image URL and binary data. A subclass DilbertImageInfo has the Dilbert-specific details for getting the URL of the web page and for extracting the image URL from it. A class WikimediaImageInfo does the same for the Wikimedia image of the day.

Because two requests are needed for fetching each image, it is convenient to make a helper method:

 public <T> CompletableFuture<T> getAsync( String url, HttpResponse.BodyHandler<T> responseBodyHandler) { HttpRequest request = HttpRequest.newBuilder() .GET() .uri(URI.create(url)) .build(); return client.sendAsync(request, responseBodyHandler) .thenApply(HttpResponse::body); }

This helper method is called in two methods for getting the image URL and data:

 private CompletableFuture<ImageInfo> findImageInfo( LocalDate date, ImageInfo info) { return getAsync(info.getUrlForDate(date), HttpResponse.BodyHandlers.ofString()) .thenApply(info::findImage); } private CompletableFuture<ImageInfo> findImageData( ImageInfo info) { return getAsync(info.getImagePath(), HttpResponse.BodyHandlers.ofByteArray()) .thenApply(info::setImageData); }

Now we are ready for our processing pipeline:

 public void load(LocalDate date, ImageInfo info) { findImageInfo(date, info) .thenCompose(this::findImageData) .thenAccept(this::process); }

The process method shows the image in a frame. See dailyImages/ImageProcessor.java for the complete code.

But it didn't work. By printing a message in process that included Thread.currentThread(), it was clear that the thread was from the global fork-join pool, not the provided executor. On my Linux laptop, the program just hung, and on Heinz's Mac, it crashed with an out of memory error when trying to fetch 10,000 images.

The Executor

Heinz wasn't the first to notice that setting the executor doesn't work as expected â see this StackOverflow query.

The bug database gives some clues. This is a change in behavior since JDK 11. Nowadays, the "dependent" tasks are not executed by the provided executor, but by the common fork-join pool. However, the documentation hasn't been updated to track the change, and that's another bug.

Let's pick apart the statements from the change notice:

"This is more familiar to developers that already use CF". Don't bet on it. It is a common mistake to starve the common pool by running blocking tasks on it.
"and reduces the likelihood of the HTTP Client being starved of threads to execute its tasks". The HTTP client needs threads for managing the selector and its responses, and of course those should never be starved. It is foolish to assume that the same executor might be appropriate for both the internal workings of the HTTP client and the tasks that process its results (as noted in this bug report). One would hope that this was never the original intent of the design, except that the "asynchronous and dependent tasks" verbiage suggests that it might have been.
"This is just default behaviour, both the HTTP Client and CompletableFuture allow more fine-grain control, if needed." Indeed, as you saw, the executor method sets the executor for the internal workings. And you can control the dependent tasks by specifying an executor:
```
 return client.sendAsync(request, responseBodyHandler) .thenApplyAsync(HttpResponse::body, executor2); 
```

In this example, we want to load a potentially large number of images. We don't want a thread per image, so let's use a fixed thread pool.

 private ExecutorService executor2 = Executors.newFixedThreadPool(100);

And we do not want to set the executor for the HTTP client internals. We have no idea what it does, and there is no documentation what kind of executor might be adequate.

Here is the takeaway for you:

Always provide an executor to the task that comes after sendAsync unless you know that the common fork-join pool is the right executor for that task.
Never call executor on an HttpClient builder unless you know that your executor is better (presumably after having studied and understood the source code of the HttpClient implementation).

The HttpClient implementation uses a cached thread pool for its tasks. On Linux, when fetching 10,000 images, there were never more than a few hundred concurrent tasks in the HttpClient executor (presumably all short-duration responses to selector events). On the Mac, the virtual machine ran out of memory after creating just over 2,000 threads - your mileage might vary. When supplying a fixed thread pool, the program hung on the Mac as it did on Linux.

Why Did the Program Hang?

The program simply calls loadAll to load all images and process them:

 public void loadAll() { long time = System.nanoTime(); try { LocalDate date = LocalDate.now(); for (int i = 0; i < NUMBER_TO_SHOW; i++) { ImageInfo info = new DilbertInfo(); info.setDate(date.toString()); System.out.println("Loading " + date); load(date, info); date = date.minusDays(1); } latch.await(); } catch (InterruptedException e) { Thread.currentThread().interrupt(); System.err.println("Interrupted"); } finally { time = System.nanoTime() - time; System.out.printf("time = %dms%n", (time / 1_000_000)); } }

The latch is initialized as

 private final CountDownLatch latch = new CountDownLatch(NUMBER_TO_SHOW);

The process method, which is called as the last part of the pipeline in the load method, calls:

 latch.countDown()

That way, the loadAll method doesn't terminate until all of the images are loaded.

This is a "happy day" design that won't hold up in real life. If anything goes wrong in the pipeline, then process may never be called.

We need to put the equivalent of a finally clause into the processing pipeline to make sure that the latch is counted down after each image has either been processed, or a failure has occurred. Here is how you do that:

 public void load(LocalDate date, ImageInfo info) { findImageInfo(date, info) .thenCompose(this::findImageData) .thenAccept(this::process) .whenComplete((x, t) -> latch.countDown()); }

The whenComplete action is invoked with the result or exceptional outcome of the completable future.

If you want to see if an exception occurred, you can check that t is not null, and then print the stack trace. Or you can sandwich in an exception handler:

 public void load(LocalDate date, ImageInfo info) { findImageInfo(date, info) .thenCompose(this::findImageData) .thenAccept(this::process) .exceptionally(t -> { t.printStackTrace(); return null; }) .thenAccept(t -> latch.countDown()); }

With this change, the program will terminate.

Here is the takeaway for you:

When you chain completable futures, don't just think of the happy day scenario. Be sure to handle exceptions, particularly if you need to manage counters or resources.

What About the Missing Images?

Now when you run the program against the Dilbert site and try getting a thousand images, you can see that the site simply refuses to serve up that many. You get exceptions that are caused by:

 javax.net.ssl.SSLHandshakeException: Remote host terminated the handshake

The site hates people who hammer it, and turns them away. It actually remembers your IP address and takes a while to get back into its good graces.

That's why the Wikimedia image of the day site is more useful for testing. Their images might not be as funny, but it earnestly tries to serve them. Still, it can't keep up. After all, a thousand requests are issued in an instant, and then the HttpClient instances await the responses. Some of them throw an exception that is caused by:

 java.io.IOException: too many concurrent streams

This is how the HttpClient reacts to an HTTP/2 server that has sent it a "go away" response (after first having informed it about the maximum number of concurrent connections). Unfortunately, with the current implementation of the HttpClient, it is impossible to find out what the nature of the failure was. People are unhappy about that, as evidenced by this and this bug report and this StackOverflow question.

The easiest remedy is to space out the requests by some amount. In our testing, 100 ms worked fine.

Should the HttpClient take this issue on? By retrying some number of times? Or spacing out requests? Or should there be better error reporting so that the users of the class can make those decisions? As it is, the HttpClient isn't quite ready for the messiness of the real world.

Here is the takeaway for you:

Just because an API has graduated from an incubator doesn't mean that all kinks have been worked out. If the API doesn't make explicit promises for hard cases, it'll be up to you to tackle them.

Java Specialists Superpack 2019

Our entire Java Specialists Training in One Huge Bundle

Unsubscribe

Cretesoft Limited 77 Strovolos Ave Strovolos, Lefkosia 2018 Cyprus