HttpClient Executors
Prof. Cay Horstmann
Read Online
Abstract:
Java 11 added the HttpClient to give us a better way to send HTTP requests. It supports asynchronous and synchronous mode. HTTP2 comes out of the box. The threading is a bit funky though and Professor Cay Horstmann explores how things work underneath the covers.
Welcome to the 271st edition of The Java(tm) Specialists' Newsletter. We have a guest author this month, Professor Cay Horstmann of Core Java fame. His article is based on some experiments that we did at JCrete, but the code has been almost completely rewritten. Kind regards - Heinz.
At JCrete 2019, Heinz Kabutz led a session that showed a mystery about configuring the thread pool for the HttpClient
class. Setting a new executor didn't have the desired effect. It turns out that the implementation has changed (and perhaps not for the better), and the documentation is lagging. If you plan to use HttpClient
asynchronously, you really want to pay attention to this. As a bonus, there are a few more useful tidbits about using HttpClient
effectively.
javaspecialists.teachable.com: Please visit our new self-study course catalog to see how you can upskill your Java knowledge.
HttpClient Executors
The HttpClient
was an incubator feature in Java 9 and has been, in its final form, a part of the Java API as of Java 11. It provides a more pleasant API than the classic HttpURLConnection
class, has a nice asynchronous interface, and works with HTTP/2. This article deals with the asynchronous interface.
Suppose you want to read a web page and then process the body once it arrives. First make a HttpClient
object:
HttpClient client = HttpClient.newBuilder() // Redirect except https to http .followRedirects(HttpClient.Redirect.NORMAL) .build();
Then make a request:
HttpRequest request = HttpRequest.newBuilder() .uri(new URI("http://horstmann.com")) .GET() .build();
Now get the response and process it, by adding to the completable future that the sendAsync
method returns:
client.sendAsync(request, HttpResponse.BodyHandlers.ofString()) .thenAccept(response -> ...);
The sendAsync
method uses non-blocking I/O to get the data. When the data is available, it is passed on to a callback for processing. The HttpClient
makes use of the standard CompletableFuture
interface. The function that was passed to thenAccept
is called when the data is ready.
In which thread? Of course not in the thread that has called client.sendAsync
. That thread has moved on to do other things.
The HttpClient.Builder
class has a method executor
:
ExecutorService executor1 = Executors.newCachedThreadPool(); HttpClient client = HttpClient.newBuilder() .executor(executor1) .followRedirects(HttpClient.Redirect.NORMAL) .build();
According to the JDK 11 docs, this "sets the executor to be used for asynchronous and dependent tasks".
Heinz's Image Grabber Mystery
At JCrete 2019, Heinz Kabutz demonstrated a program that grabbed Dilbert comics of the day, going to URLs of the form https://dilbert.com/strip/2019-08-21
, finding the image URLs inside, and then loading the images.
Here is a slight simplification of the code.
This file ImageInfo.java has a class ImageInfo
that holds the image URL and binary data. A subclass DilbertImageInfo
has the Dilbert-specific details for getting the URL of the web page and for extracting the image URL from it. A class WikimediaImageInfo
does the same for the Wikimedia image of the day.
Because two requests are needed for fetching each image, it is convenient to make a helper method:
public <T> CompletableFuture<T> getAsync( String url, HttpResponse.BodyHandler<T> responseBodyHandler) { HttpRequest request = HttpRequest.newBuilder() .GET() .uri(URI.create(url)) .build(); return client.sendAsync(request, responseBodyHandler) .thenApply(HttpResponse::body); }
This helper method is called in two methods for getting the image URL and data:
private CompletableFuture<ImageInfo> findImageInfo( LocalDate date, ImageInfo info) { return getAsync(info.getUrlForDate(date), HttpResponse.BodyHandlers.ofString()) .thenApply(info::findImage); } private CompletableFuture<ImageInfo> findImageData( ImageInfo info) { return getAsync(info.getImagePath(), HttpResponse.BodyHandlers.ofByteArray()) .thenApply(info::setImageData); }
Now we are ready for our processing pipeline:
public void load(LocalDate date, ImageInfo info) { findImageInfo(date, info) .thenCompose(this::findImageData) .thenAccept(this::process); }
The process
method shows the image in a frame. See dailyImages/ImageProcessor.java for the complete code.
But it didn't work. By printing a message in process
that included Thread.currentThread()
, it was clear that the thread was from the global fork-join pool, not the provided executor. On my Linux laptop, the program just hung, and on Heinz's Mac, it crashed with an out of memory error when trying to fetch 10,000 images.
The Executor
Heinz wasn't the first to notice that setting the executor doesn't work as expected â see this StackOverflow query.
The bug database gives some clues. This is a change in behavior since JDK 11. Nowadays, the "dependent" tasks are not executed by the provided executor, but by the common fork-join pool. However, the documentation hasn't been updated to track the change, and that's another bug.
Let's pick apart the statements from the change notice:
- "This is more familiar to developers that already use CF". Don't bet on it. It is a common mistake to starve the common pool by running blocking tasks on it.
- "and reduces the likelihood of the HTTP Client being starved of threads to execute its tasks". The HTTP client needs threads for managing the selector and its responses, and of course those should never be starved. It is foolish to assume that the same executor might be appropriate for both the internal workings of the HTTP client and the tasks that process its results (as noted in this bug report). One would hope that this was never the original intent of the design, except that the "asynchronous and dependent tasks" verbiage suggests that it might have been.
- "This is just default behaviour, both the HTTP Client and CompletableFuture allow more fine-grain control, if needed." Indeed, as you saw, the
executor
method sets the executor for the internal workings. And you can control the dependent tasks by specifying an executor: return client.sendAsync(request, responseBodyHandler) .thenApplyAsync(HttpResponse::body, executor2);
In this example, we want to load a potentially large number of images. We don't want a thread per image, so let's use a fixed thread pool.
private ExecutorService executor2 = Executors.newFixedThreadPool(100);
And we do not want to set the executor for the HTTP client internals. We have no idea what it does, and there is no documentation what kind of executor might be adequate.
Here is the takeaway for you:
- Always provide an executor to the task that comes after
sendAsync
unless you know that the common fork-join pool is the right executor for that task. - Never call
executor
on an HttpClient
builder unless you know that your executor is better (presumably after having studied and understood the source code of the HttpClient
implementation).
The HttpClient
implementation uses a cached thread pool for its tasks. On Linux, when fetching 10,000 images, there were never more than a few hundred concurrent tasks in the HttpClient
executor (presumably all short-duration responses to selector events). On the Mac, the virtual machine ran out of memory after creating just over 2,000 threads - your mileage might vary. When supplying a fixed thread pool, the program hung on the Mac as it did on Linux.
Why Did the Program Hang?
The program simply calls loadAll
to load all images and process them:
public void loadAll() { long time = System.nanoTime(); try { LocalDate date = LocalDate.now(); for (int i = 0; i < NUMBER_TO_SHOW; i++) { ImageInfo info = new DilbertInfo(); info.setDate(date.toString()); System.out.println("Loading " + date); load(date, info); date = date.minusDays(1); } latch.await(); } catch (InterruptedException e) { Thread.currentThread().interrupt(); System.err.println("Interrupted"); } finally { time = System.nanoTime() - time; System.out.printf("time = %dms%n", (time / 1_000_000)); } }
The latch is initialized as
private final CountDownLatch latch = new CountDownLatch(NUMBER_TO_SHOW);
The process
method, which is called as the last part of the pipeline in the load
method, calls:
latch.countDown()
That way, the loadAll
method doesn't terminate until all of the images are loaded.
This is a "happy day" design that won't hold up in real life. If anything goes wrong in the pipeline, then process
may never be called.
We need to put the equivalent of a finally
clause into the processing pipeline to make sure that the latch is counted down after each image has either been processed, or a failure has occurred. Here is how you do that:
public void load(LocalDate date, ImageInfo info) { findImageInfo(date, info) .thenCompose(this::findImageData) .thenAccept(this::process) .whenComplete((x, t) -> latch.countDown()); }
The whenComplete
action is invoked with the result or exceptional outcome of the completable future.
If you want to see if an exception occurred, you can check that t
is not null
, and then print the stack trace. Or you can sandwich in an exception handler:
public void load(LocalDate date, ImageInfo info) { findImageInfo(date, info) .thenCompose(this::findImageData) .thenAccept(this::process) .exceptionally(t -> { t.printStackTrace(); return null; }) .thenAccept(t -> latch.countDown()); }
With this change, the program will terminate.
Here is the takeaway for you:
- When you chain completable futures, don't just think of the happy day scenario. Be sure to handle exceptions, particularly if you need to manage counters or resources.
What About the Missing Images?
Now when you run the program against the Dilbert site and try getting a thousand images, you can see that the site simply refuses to serve up that many. You get exceptions that are caused by:
javax.net.ssl.SSLHandshakeException: Remote host terminated the handshake
The site hates people who hammer it, and turns them away. It actually remembers your IP address and takes a while to get back into its good graces.
That's why the Wikimedia image of the day site is more useful for testing. Their images might not be as funny, but it earnestly tries to serve them. Still, it can't keep up. After all, a thousand requests are issued in an instant, and then the HttpClient
instances await the responses. Some of them throw an exception that is caused by:
java.io.IOException: too many concurrent streams
This is how the HttpClient
reacts to an HTTP/2 server that has sent it a "go away" response (after first having informed it about the maximum number of concurrent connections). Unfortunately, with the current implementation of the HttpClient
, it is impossible to find out what the nature of the failure was. People are unhappy about that, as evidenced by this and this bug report and this StackOverflow question.
The easiest remedy is to space out the requests by some amount. In our testing, 100 ms worked fine.
Should the HttpClient
take this issue on? By retrying some number of times? Or spacing out requests? Or should there be better error reporting so that the users of the class can make those decisions? As it is, the HttpClient
isn't quite ready for the messiness of the real world.
Here is the takeaway for you:
- Just because an API has graduated from an incubator doesn't mean that all kinks have been worked out. If the API doesn't make explicit promises for hard cases, it'll be up to you to tackle them.
Our entire Java Specialists Training in One Huge Bundle