分析 Java 堆转储

本文的目的在于帮助您使用 Eclipse MAT 分析获取的堆转储。内容涵盖如何解析大型堆文件以及需要关注的要点。

当出现 OutOfMemory 异常时，如果在 neo4j.conf 中设置了以下参数，将生成 .hprof 文件。

dbms.jvm.additional=-XX:+HeapDumpOnOutOfMemoryError

您也可以调整以下设置来指定目录路径，但请确保在出现此类错误时有足够的磁盘空间。

dbms.jvm.additional=-XX:HeapDumpPath=/var/tmp/dumps
dbms.jvm.additional=-XX:OnOutOfMemoryError="tar cvzf /var/tmp/dump.tar.gz /var/tmp/dump;split -b 1G /var/tmp/dump.tar.gz;"

此文件是运行在您系统上的 Java 进程堆部分的镜像。文件结构取决于您运行 Neo4j 所使用的 JVM 供应商。

Oracle JDK、Open JDK 会生成 hprof 文件，可使用多数常见工具进行分析。IBM 堆转储则需要使用 IBM Heap Analyzer 或其他专有工具进行解析。

在 MemoryAnalyzer.ini 中更改设置

在本地环境中

您需要为进程分配的内存量与堆转储文件大小相当。

例如：如果堆约为 15GB，则分配 17GB 内存。

对于大型堆转储（> 25G），请参见下一节。

Edit MemoryAnalyzer.ini (on macOS, it is located in /Applications/mat.app/Contents/Eclipse/MemoryAnalyzer.ini)

添加或更改设置

-Xms10G
-Xmx25G

在远程机器上

最好将其上传到具有大量磁盘和内存的 AWS/GCP 等云实例上。如果选择 AWS，请使用 spot instance。

随后需要挂载 EBS storage，创建 250GB 卷并附加到 EC2 实例。格式化该卷并在 Amazon Linux 实例上挂载。

记录下 instanceid 和 storageid，以确保资源在使用后被正确回收。

如果堆约为 61GB，解析时需要两倍的磁盘空间。如下所示

$ du -ch java_pid19820*
116M	java_pid19820.a2s.index
5.6G	java_pid19820.domIn.index
 17G	java_pid19820.domOut.index
 61G	java_pid19820.hprof #original heap dump
256K	java_pid19820.i2sv2.index
 11G	java_pid19820.idx.index
 29G	java_pid19820.inbound.index
197M	java_pid19820.index
4.5G	java_pid19820.o2c.index
 12G	java_pid19820.o2hprof.index
 11G	java_pid19820.o2ret.index
 29G	java_pid19820.outbound.index
988K	java_pid19820.threads
 68K	java_pid19820_Component_Report_sel.zip
180G	total

前置条件：安装 Java 并确保有 250GB 可用空间
Download MemoryAnalyzer tool for linux: 下载
将其解压到某个目录
Edit MemoryAnalyzer.ini to adjust both -Xms and -Xmx memory settings

-startup
plugins/org.eclipse.equinox.launcher_1.5.0.v20180512-1130.jar
--launcher.library
plugins/org.eclipse.equinox.launcher.gtk.linux.x86_64_1.1.700.v20180518-1200
-vmargs
-Xms30G
-Xmx100G

在远程机器上解析文件

This step is optional if you run Eclipse MAT on your local machine and have enough resources. The index files will be created when opening the heapdump file if they are missing.

Run ./ParseHeapDump.sh heapdump.hprof

It is located in the folder mat of Eclipse Mat tar.gz installation file

To speed up things, you can use rsync over ssh. The advantage is that you can recover if you have a crash and -z flag enables compression.

示例

# on the remote machine
$ mkdir ${REMOTE_DIR}/parsed_files
$ mv *.index ${REMOTE_DIR}/parsed_files/

# on your local machine
$ rsync -P  -e "ssh -i ${PATH_TO_KEY}"  ec2-user@${REMOTE_IP}:${REMOTE_DIR}/heapdump.zip .
$ rsync -Prz  -e "ssh -i ${PATH_TO_KEY}  ec2-user@${REMOTE_IP}:${REMOTE_DIR}/parsed_files/ .

打开 Eclipse MAT

To open the heapdump, go to File > Open Heap Dump (Not Acquire Heap Dump) and browse to your heapdump location.

No need to open an existing report, press cancel if you have a modal dialog.

In the Overview tab, left-click on the largest object(s)

Choose "list objects" > "with outgoing references".

It will open a new tab with the list of all the elements.

Expand the first level then expand everything at the second level.

Cypher 查询字符串

There are a lot of objects in a heap dump, no need to go through the Object[],byte[],Strings, etc.

You might want to filter for the class that contain PreParsed. Once found, list their outgoing references to cross check of the one that has the most instances. A new tab will open and you will be able to see the rawStatement of the Cypher queries.

检查线程转储

With thread dumps that has been taken before the heap dump

The garbage collector will not be able to collect the thread objects until the threading system also dereferences the object, which won’t happen if the thread is alive.

So if you have a large amount of memory in the heap, there should be a potentially long running thread associated to your large object.

To find it, look for the thread name in the thread dumps.

$ grep neo4j.BoltWorker-394 *

5913-tdump-201903291746.log:"neo4j.BoltWorker-394 [bolt]" #620 daemon prio=5 os_prio=0 tid=0x00007fb737619800 nid=0x8cec waiting on condition [0x00007fb38d00f000]
5913-tdump-201903291751.log:"neo4j.BoltWorker-394 [bolt] [/www.xxx.yyy.zzz:57570] " #620 daemon prio=5 os_prio=0 tid=0x00007fb737619800 nid=0x8cec runnable [0x00007fb38d00b000]
5913-tdump-201903291756.log:"neo4j.BoltWorker-394 [bolt] [/www.xxx.yyy.zzz:57570] " #620 daemon prio=5 os_prio=0 tid=0x00007fb737619800 nid=0x8cec runnable [0x00007fb38d00b000]

Note that the thread dumps are included in the heap dump. They are available in plain text in the file but you don’t have the STATE information in Eclipse Mat. You can have them with other tools such as VisualVM

$ head -10 java_pid19820.threads
Thread 0x7fd64b0e1610
  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter()Ljava/util/concurrent/locks/AbstractQueuedSynchronizer$Node; (AbstractQueuedSynchronizer.java:1855)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(J)J (AbstractQueuedSynchronizer.java:2068)
  at java.util.concurrent.LinkedBlockingQueue.poll(JLjava/util/concurrent/TimeUnit;)Ljava/lang/Object; (LinkedBlockingQueue.java:467)
  at com.hazelcast.util.executor.CachedExecutorServiceDelegate$Worker.run()V (CachedExecutorServiceDelegate.java:210)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V (ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run()V (ThreadPoolExecutor.java:624)
  at java.lang.Thread.run()V (Thread.java:748)
  at com.hazelcast.util.executor.HazelcastManagedThread.executeRun()V (HazelcastManagedThread.java:76)
  at com.hazelcast.util.executor.HazelcastManagedThread.run()V (HazelcastManagedThread.java:92)

此页面有帮助吗？

知识库