HikariPool-1 - Connection is not available

This afternoon, I got a report that website A is not accessible. I login the server and first checked the nginx error log. It says connection to upstream is timed out.

1
2
3
4
...
2019/04/20 09:58:42 [error] 6327#0: *129762 upstream timed out (110: Connection timed out) while reading response header from upstream, client: <xxx.xxx.xx.xx>, server: <xxx.xxxxxxxx.xxx>, request: "GET /api/user/me HTTP/1.1", upstream: "http://127.0.0.1:5000/api/user/me", host: "<xxx.xxxxxxxx.xxx>", referrer: "https://<xxx.xxxxxxxx.xxx>/page/home"
2019/04/20 09:58:42 [error] 6327#0: *129763 upstream timed out (110: Connection timed out) while reading response header from upstream, client: <xxx.xxx.xx.xx>, server: <xxx.xxxxxxxx.xxx>, request: "GET /api/home/loadHomeData HTTP/1.1", upstream: "http://127.0.0.1:5000/api/home/loadHomeData", host: "<xxx.xxxxxxxx.xxx>", referrer: "https://<xxx.xxxxxxxx.xxx>/page/home"
...

Then I checked log from Spring

1
2
3
4
5
6
org.springframework.transaction.CannotCreateTransactionException: Could not open JPA EntityManager for transaction; nested exception is org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
...
Caused by: org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
... 61 common frames omitted
Caused by: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30000ms.
...

It clearly indicates that connection pool is running out of free connections. Connections are not returned to connection pool as expected. Incoming requests temping to acquire connection wait on it and finally timed out, which further caused the reading from nginx timed out.

For interest, I checked the network connection status with

1
netstat -peanut

As shown below, majortiy of them are in CLOSE_WAIT. If you still remember the state transition for TCP, the sever side of a TCP connection will enter CLOSE_WAIT state when it receive FIN from its client. Thus, we can infer that the connection is originally issued from 127.0.0.1:xxxxx to 127.0.0.1:5000. This make sense, as 127.0.0.1:5000 is the upstream server to which where Nginx proxy incoming requests.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Proto Recv-Q Send-Q Local Address           Foreign Address         State       User       Inode       PID/Program name
tcp 1 0 127.0.0.1:5000 127.0.0.1:13160 CLOSE_WAIT 1000 66628224 4209/java
tcp 1 0 127.0.0.1:5000 127.0.0.1:13152 CLOSE_WAIT 1000 66628218 4209/java
tcp 1 0 127.0.0.1:5000 127.0.0.1:57766 CLOSE_WAIT 1000 62175223 4209/java
tcp 1 0 127.0.0.1:5000 127.0.0.1:21812 CLOSE_WAIT 1000 71135785 4209/java
tcp 1 0 127.0.0.1:5000 127.0.0.1:13150 CLOSE_WAIT 1000 66628836 4209/java
...
tcp 1 0 127.0.0.1:5000 127.0.0.1:13794 CLOSE_WAIT 1000 66644609 4209/java
tcp 1 0 127.0.0.1:5000 127.0.0.1:13154 CLOSE_WAIT 1000 66628222 4209/java
tcp 1 0 127.0.0.1:5000 127.0.0.1:13792 CLOSE_WAIT 1000 66637431 4209/java
tcp 1 0 127.0.0.1:5000 127.0.0.1:30190 CLOSE_WAIT 1000 64263646 4209/java
tcp 1 0 127.0.0.1:5000 127.0.0.1:13228 CLOSE_WAIT 1000 66629637 4209/java
tcp 1 0 127.0.0.1:5000 127.0.0.1:35004 CLOSE_WAIT 1000 72919978 4209/java
tcp 1 0 127.0.0.1:5000 127.0.0.1:13210 CLOSE_WAIT 1000 66629002 4209/java
tcp 1 0 127.0.0.1:5000 127.0.0.1:13198 CLOSE_WAIT 1000 66628344 4209/java

So, the problem is clear. Then what’s the cause? Unfortunately, previously I didn’t enable HikariCP‘s Connect leak detection. And I also didn’t enable Postgresql’s slow log.

HikariCP is the default connection pool used by Spring boot since 2.0. I checked its configuration with Spring boot actuator endpoint.

For your interest, the actuator was configed as following

1
2
3
4
5
6
7
8
9
10
@Configuration
public class ActuatorSecurityConfig extends WebSecurityConfigurerAdapter {
@Override
protected void configure(HttpSecurity http) throws Exception {
http.requestMatcher(EndpointRequest.toAnyEndpoint()).authorizeRequests()
.anyRequest().hasRole("SUPER_ADMIN")
.and()
.httpBasic();
}
}

Since the upstream server is only bind to 127.0.0.1, I can only invoke that endpoint within the server.

1
curl -u <account>:<password>  http://127.0.0.1:5000/actuator/configprops

Yes, sad but true, leakDetectionThreshold is the default value 0 (disabled).

To enable it, add the following config in my application.yml

1
2
3
4
spring:
datasource:
hikari:
leak-detection-threshold: 30000 # 30 seconds

By the way, I also change com.zaxxer.hikari‘s log level to DEBUG by following the adivce from the Author of HikariCP here.

1
2
3
4
5
6
// logback-spring.xml in classpath root
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<include resource="org/springframework/boot/logging/logback/base.xml"/>
<logger name="com.zaxxer.hikari" level="DEBUG"/>
</configuration>

That’s what I have done to enhance my spring-boot server. Hope it can catch some useful info when the same problem happened again.

That’s not enough, Postgresql as the database should be able to tell something useful too. So I enable the slow query log there. Edit in postgresql.conf.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# log to file with suffix ".log"
log_destination = 'stderr'
# required to be on for generating log
loggin_collector = on
# the direction should exist and postgres server should has the read&write permission
log_directly = '/var/log/postgresql'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
# enable log file rotation
log_truncate_on_rotation = on
# rotate log file on a daily basis
log_rotation_age = 1d
# if a query takes more that 2 seconds, log it out
log_min_duration_statement = 2000
# if this set to not `none`, then log_min_duration_statement will be ignored
log_statement = 'none'
log_timezone = 'Asia/Shanghai'

You can locate postgresql.conf file with query:

1
show config_file;

Some fields require a restarting after modified. If not required, then simple execute the below SQL is enough to make it take effect.

1
SELECT pg_reload_conf();

Try with

1
SELECT pg_sleep(7.5);

A corresponding log should have been logged into the correct place.

You can also read this article which mentions that simply increasing maximumPoolSize as a countermeansure for this problem is not a good idea.

References:

How to identify slow queries in PostgreSQL

How to show queries log in PostgreSQL?

HikariCP

Production Ready Endpoints

logging

How to copy files fastly

First let’s generate a dummy file for test. We can do it with the following cmd superly fast.

1
fallocate -l 2G copy_test.img

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import java.io.*;
import java.nio.file.Files;

public class Main {

public static void main(String[] args) {
File source = new File("/home/benson/develop/test/copy_test.img");
File target1 = new File("/home/benson/develop/test/target1.img");
File target2 = new File("/home/benson/develop/test/target2.img");
File target3 = new File("/home/benson/develop/test/target3.img");
System.out.println("Source file is of size: " + source.length());
long pre = System.currentTimeMillis();
long now = -1;
try(FileInputStream fin = new FileInputStream(source); FileOutputStream fout = new FileOutputStream(target1)) {
fin.getChannel().transferTo(0, source.length(), fout.getChannel());
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
now = System.currentTimeMillis();
System.out.println("NIO tranfer takes: " + (now - pre));
pre = now;
try {
Files.copy(source.toPath(), target2.toPath());
} catch (IOException e) {
e.printStackTrace();
}
now = System.currentTimeMillis();
System.out.println("Files.copy takes: " + (now - pre));
pre = now;
try (BufferedInputStream bin = new BufferedInputStream(new FileInputStream(source));
BufferedOutputStream bout = new BufferedOutputStream(new FileOutputStream(target3));) {
byte[] bs = new byte[512];
int count = -1;
while ((count = bin.read(bs)) != -1) {
bout.write(bs, 0, count);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
now = System.currentTimeMillis();
System.out.println("BufferedInputStream takes: " + (now - pre));
}

}

Output:

1
2
3
4
Source file is of size: 2147483648
NIO tranfer takes: 893
Files.copy takes: 3087
BufferedInputStream takes: 13113

NIO FileChannel::transfer has the best performance. It utilizes direct ByteBuffer, which directly operates the file data in OS’s kernal space. While Files::copy and BufferedInputStream has to transfer data from kernal space and vise versa.

I planned to also test for MappedByteBuffer, but it doesn’t support to map more than 2 GB(precisely 2^31-1 bytes) data into memory directly. So I give it up. But basically, I guess it would have the same performace as FileChannel::transferTo.

But currently I don’t have a clue why BufferedInputStream is way worse than Files.copy. Let’s explore it later.

Please not that direct buffer is allocated outside of Java heap space. And it’s only cleaned at Full GC. So we have to use it carefully to avoid OOM. To adjust max direct memory size, we can use

1
-XX:MaxDirectMemorySize=512M

Create blog with Hexo on Git Page

After several comparison, I choose to use Hexo to build my personal blog. And for simplicity, I decide to use Github Page to host my blog.

Basically, you need to maintain two git repositories. One for Hexo itself, and another for the generated static files. The second is needed as Github Page requires its content to be push to a repository of certain name.

generate static files

1
hexo generate

Copy generated files to the deploy folder (to be pushed to github)

1
2
3
# the first folder is where static files generated 
# You may need to also add the "-n" option to perform a dry run to see the changes in advance.
rsync -avzh --delete --exclude '.git' <SOME_PATH>/blog/public/ <SOME_PATH>/blog_deploy/

rsync usage reference

Using GoAccess to monitor your nginx access log

Suppose you built a Web site, you must be curious at how much traffic your site attract and how do people using it.

You use Nginx as the coordinator server and you know you can analyze its access log.

Analyzing plain text from stratch is hard, and here comes GoAccess, which is amazing.

Here is how it looks.

It’s straight forward to setup GoAccess by following the offical Get Started. I just going through serveral points that you may need to take care.

1. Firstly put the configuration file to the right place

Accordign to the doc

The configuration file is located under ~/.goaccessrc or %sysconfdir%/goaccess.conf where %sysconfdir% is either /etc/, /usr/etc/ or /usr/local/etc/.

It would be a good idea to make changes in the config file, otherwise you have to pass in them everytime your invoke goaccess in command line.

2. You probably need to change the log-format for nginx

If you have several Servers (virtual hosts in Apache) configed in Nginx, you may need to change GoAccess’s default COMBINED log format.

The COMMBINED format is

1
%h %^[%d:%t %^] "%r" %s %b "%R" "%u"

and it does work perfectly with Nginx’s default access log format, which is

1
'$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'

However, as you can see, it doesn’t contain any info for virtual hosts. To make GoAccess analyze virtual host as well, we need to change both Nginx access log format and GoAccess’s log format.

Here is what I did. For backward compatibility, I just switch Nginx’s $remote_user to $host

1
'$remote_addr - $host [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'

and change GoAccess log-format to

1
%h %^ %v [%d:%t %^] "%r" %s %b "%R" "%u"

Please note that while using custom log-format, we need to specify date-format and time-format explict.

That’s all for GoAccess for now.

Import spring-framework into Eclipse

Basically, you just need to follow this guide from spring-framwork github page

Note that I checkout the master branch

From the offical guide, there are 6 steps.

  1. Precompile spring-oxm with ./gradlew :spring-oxm:compileTestJava
  2. Import into Eclipse (File -> Import -> Gradle -> Existing Gradle Project -> Navigate to directory -> Select Finish)
  3. If prompted, exclude the spring-aspects module (or after the import by closing or deleting the project)
  4. In the spring-oxm project, add the two folders (castor and jaxb) in build/generated-sources to the build path (right click on them and select Build Path -> Use as Source Folder)
  5. To apply project specific settings run ./gradlew eclipseBuildship
  6. Code away

Here I record problems encountered which may be helpful to someone else.

  1. Precompile spring-oxm with ./gradlew :spring-oxm:compileTestJava

I have network problem in downloading gradle distribution online, it’s extremely slow.

To workaround it, I first used the greate tool aria2c to download gradle zip file to my local.

1
aria2c -x 8 -s 8 -k 1M  https://services.gradle.org/distributions/gradle-4.10.3-bin.zip

It issues 8 threads to download the target file parallelly.

Then, locate the gradle-wrapper.properties file under spring-framework/gradle/wrapper, modify its distributionUrl property

1
2
#distributionUrl=https\://services.gradle.org/distributions/gradle-4.10.3-bin.zip
distributionUrl=file:///home/benson/Downloads/gradle-4.10.3-bin.zip

In this way, gradlew will download the zip file from your local.

  1. In the spring-oxm project, add the two folders (castor and jaxb) in build/generated-sources to the build path (right click on them and select Build Path -> Use as Source Folder)

Successfully go through step 2 to 3. But then I couldn’t find build folder under spring-oxm, in Eclipse package exploreer.

To show it, you need to ensure

  • The Java Element Filter in package explorer have Gradle build folder unchecked.

  • Resource > Resource Filters in project property doesn’t have rule to exclude build folder.

Besides this, I also encountered error that

xxx is not accessible due to restriction on required library C:\Program Files\Java\jdk1.8.0_181\jre\lib\rt.jar

I found one solution for this from Stackoverlow here

I quote the solution here

  1. Go to the Build Path settings in the project properties.
  2. Remove the JRE System Library
  3. Add it back; Select “Add Library” and select the JRE System Library. The default worked for me.

Lastly, there is another error under project spring-beans. It says that

GroovyDynamicElementReader cannot be resolved to a type

It requires to download corresponding Groovy plugin to resolve this problem.

Monitor JVM Process Remotely

Monitor JVM from command line locally

You can monitor your target JVM process with many tools, say jstat, jinfo, jstack, jmap and so on.

First find out vimid with

1
jps -l

and then run

1
jinfo <vimid>

But for the first time, you may get the following errors

Error attaching to process: sun.jvm.hotspot.debugger.DebuggerException: Can’t attach to the process … not permitted …

After some googling, we can slove it by (it’s due to a bug of Linux kernal)

1
echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope

Mointor JVM with graphical tools remotely

Setup JMX RMI

Besides using commands locally, we can also use graphical tools like Jconsole or VisualVM to monitor a JVM process remotely. The latter one is a replacement for the former one.

To enable such remote monitoring, you need to make your remote JVM process to be ready for remote JMX RMI connections.

Here is the official reference.

I summarize my steps as following.

cp JRE_HOME/lib/management/management.properties to home directory

1
2
3
4
5
6
com.sun.management.jmxremote.port=xxx
com.sun.management.jmxremote.ssl=false
com.sun.management.jmxremote.authenticate=true
com.sun.management.jmxremote.password.file=/home/benson/jmxremote.password
com.sun.management.jmxremote.access.file=/home/benson/jmxremote.access
com.sun.management.jmxremote.rmi.port=xxx // this is very important, let's talk about it later

cp JRE_HOME/lib/management/jmxremote.password.template to home directory as jmxremote.password.

1
2
monitorRole xxx
controlRole xxx

For security reason, you have to change it as owner readable/writable only

1
chmod 600 jmxremote.password

cp JRE_HOME/lib/management/jmxremote.access to home directory.

1
2
3
4
monitorRole   readonly
controlRole readwrite \
create javax.management.monitor.*,javax.management.timer.* \
unregiste

start your jar with your mangement file specified

1
java -jar -Dcom.sun.management.config.file=$HOME/management.properties -Djava.rmi.server.hostname=xx.xx.xx.xx xxx.jar

Please note that java.rmi.server.hostname is also important here. According to the offical document

By default, the remote stubs for locally created remote objects that are sent to client contains the IP address of the local host in dotted-quad format. For remote stubs to be associated with a specific interface address, the java.rmi.server.hostname system property must be set to IP address of that interface.

In my case, if I start my remote Java VM without java.rmi.server.hostname being specified to the remote public IP, with Jconsole started with -debug, I got error

1
2
3
java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is: 
java.net.ConnectException: Connection refused (Connection refused)
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619)

As I mentioned earilier, if you are in a serious production environment(with restrcit firewall rule), you probably need to setup com.sun.management.jmxremote.rmi.port as well.

According to the offical document

setting this property(com.sun.management.jmxremote.port=portNum) publishes an additional RMI connector in a private read-only registry at the specified port using the name, jmxrmi. The port number to which the RMI connector will be bound using the system property:
com.sun.management.jmxremote.rmi.port

If this property is not specified, it will randomly pick an unused port. As a result, if you are using white list for your firewall rule, connections to your remote JVM process will be blocked.

Lastly, please notice that when adding JMX connection in VisualVM, you need to specify com.sun.management.jmxremote.port rather than com.sun.management.jmxremote.rmi.port. Then former one should be where the RMI registry listen to.

Connect to remote server with jvisualvm

It’s straight forward to setup your jvisualvm client. Just note that if you follow my preivous steps, you need to specify the credentials configed in file jmxremote.password.

Setup Jstatd (Optional)

Alike to JMX, Jstatd also provides you ability to monitor JVM. But it’s much more limited. There is a comparison between JMX and Jstatd here.

In average cases, JMX is enough for you. The reason why I need to setup Jstatd is that Visual GC plugin is based on it. The heap graph coming with raw VisualVM is too simple to do meanful JVM memory monitoring.

Starting Jstatd directly will give you error

1
2
3
4
5
Could not create remote object
access denied ("java.util.PropertyPermission" "java.rmi.server.ignoreSubClasses" "write")
java.security.AccessControlException: access denied ("java.util.PropertyPermission" "java.rmi.server.ignoreSubClasses" "write")
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:372)
...

You can refer to the offical document Jstatd and Policy File Syntax.

Basically, you need to first setup a policy file

1
2
3
grant codebase "file:${java.home}/../lib/tools.jar" {
permission java.security.AllPermission;
};

The above config uses Property Expansion. Please note that

If a property can’t be expanded in a grant entry, permission entry, or keystore entry, that entry is ignored.

Then start jstatd with

1
jstatd -J-Djava.security.policy=/home/benson/jmxremote/jstatd.all.policy  -J-Djava.rmi.server.hostname=xxx

you may also specify -J-Djava.rmi.server.logCalls=true for debugging purpose.
Please note that you need to specify java.rmi.server.hostname as before.

By default, the above command will start jstatd with an internal RMI registry, bound to the default port 1099. Besides, it will also randomly pick a port for the RMI connector.

Note: If your JVM process is behind a firewall, you need to take special care to have this random port open to your monitor client.

Setup Nginx for MSA Development

Problems to solve

Our product consists of several sub-systems. Each sub-system is developed and maintained by a different team. And code-wise, the entire product is organized with Maven in a big Git repository (acutally, better to be splitted) and each sub-system is maintained as a Maven module.

Originally, to setup environment for development, we have to build all modules besides the ones we are turely responsible for. The drawback is obvious, it takes two much time to build the whole project. And as the number of modules increases, this problem becomes worser. In addition, to incooperate changes from other module during development, each time we have to recompile those modules. Although we can utilize Maven to only build partial or specific modules, it’s still a bit annoying.
Last but not least, it can be quite difficult to setup everything you needed if one modules have many indirect dependencies, which may require different configurations to startup.

Solution

To migarte this problem, we maintain a running product instance containing all sub-systems, which is updated on a regular basis (Say one day). Let’s call such environment as “Development environment”. And setup Nginx locally to route the requests generated during development to the right place. Specifically, say I’m developing module A, then I will instruct the Nginx instance to route all requests to module A to my local web server, while route all other requests to “Development enviroment”.

In this way, we only need to care about the module we developed. Other modules are built and updated automatically.

Setup Nginx for routing

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
server {
# make your nginx listen to 8089
listen 8089 default_server;

root /var/www/html;

index index.html index.htm;

server_name _;

location / {
proxy_set_header Host $http_host;
# suppose your local server listens to 8080
proxy_pass http://localhost:8080;
# enable intercepting HTTP errors codes
proxy_intercept_errors on;
# internal rediret to error page defined by "@404found" when the error code is 404, and override the return code
error_page 404 = @404found;
# the following config is needed for websocket proxying
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
}

location @404found {
proxy_set_header Host $http_host;
proxy_pass <URL for your development enviroment>;
# the following config is needed for websocket proxying
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
}
}

The above Nginx config, you need to access your local development server via localhost:8089 and the Nginx server will do the routing for you.

Debug Nginx Setting

It would be greate to debug your Nginx setting. It can be annoying when you make some wrong configuration but don’t know where it is. Refer to this URL for debugging

Use tmux to manage your remote server

Concepts: one sesson can have multiple windows, and one window can have multiple panels.

Suppose now you just ssh to your remote server.
First create a new tmux session. By default, it has one window and one panel in it.

1
tmux

Give the session a meaningful name, so that you can later attach to it easily

1
<prefix> $

Give the current window a meaningful name

1
<prefix> ,

Split the current panel

1
2
<prefix> % // horizontally
<prefix> " // vertically

Oops, you create a new panel by mistake. Let’s close it

1
<prefix> x

And now you have too many panels in one window, let’s create a new window

1
<prefix> c

Switch between windows

1
2
<prefix> n // to next
<prefix> p // to previous

And close a window

1
<prefix> & // close a window

Now you have your work done. While closing your connection to remote server, you want your program keep running.
Or, you want to pick up where you was later. It’s easy, just detach the session

1
<prefix> d

Serval days later, you come back and want to resume your work. First, list all session alive in the backend.

1
tmux ls

Attach to your desired session by name

1
tmux a -t <name>

These commands are pretty much you need for tmux.