知识库

解析 query.log 的一种方法

当通过 Neo4j Enterprise 参数 dbms.logs.query.enabled 启用了 query.log 时,随附的 bash 脚本可用于快速解析日志,并根据总执行时间识别出前 10 条最耗时的查询;如果同时启用了 dbms.logs.query.time_logging_enabled,则还会报告基于规划时间、CPU 时间和等待时间的前 10 条最耗时查询。

bash 脚本如下

#!/bin/bash
# Copyright (c) 2002-2018 "Neo Technology,"
# Network Engine for Objects in Lund AB [http://neotechnology.com]
# This file is a commercial add-on to Neo4j Enterprise Edition.

# only parse those lines where the 3rd field is 'INFO'
# count total number of lines in query.log which report INFO in the 3rd field

parse() {
logfile=$1
if [ -e $logfile ]; then
    # log file exits
    awk '$3== "INFO" {print "First Query Reported at: " $1 " " $2}' $logfile | head -1
    awk '$3== "INFO" {print " Last Query Reported at: " $1 " " $2}' $logfile | tail -1


    awk '$3== "INFO" {count++;  total=total+$4 } END{print "\n*******EXECUTION:******* \nTotal # of Completed Queries: " count "\n       Total Duration (msec): " total "\n   Avg of all Queries (msec): " total/count "\n" }' $logfile

    # print longest query
    printf "Top 10 Longest Queries longest on top and leading line number: \n"
    printf "Note: Queries which are multi-line will only report first line of query !!!\n\n\n"
    awk '$3== "INFO" {print "line:" NR "\t" $0}' $logfile | sort -n -r -k 5 | head -10

    pcw=`grep "INFO" $logfile | grep "(planning:" | head -1`
    if [ ! -z "$pcw" ]; then
          # only enter this block if conf/neo4j.conf has
          #     dbms.logs.query.time_logging_enabled=true
          # which thus causes field 6 to appear as `(planning:...........`
          #
          # produce metrics on planning, CPU and waiting
          #PLANING
          awk '$3== "INFO" && $6== "(planning:" {count++;  total=total+$7 } $7== "0," {planzero++;} END{print "\n\t\t*******PLANNING:******* \n Total # of Completed Queries: " count "\n        Total Duration (msec): " total "\n    Avg of all Queries (msec): " total/count "\nNumber of Queries NOT Planned: " planzero "    " (planzero/count)*100 "%\n" }' $logfile

          # print longest query
          awk '$3== "INFO" {print "line:" NR "\t" $0}' $logfile | sort -n -r -k 8 | head -10



          #CPU
          awk '$3== "INFO" && $8== "cpu:" {count++;  total=total+$9 } END{print "\n\t\t*******CPU:******* \nTotal # of Completed Queries: " count "\n       Total Duration (msec): " total "\n   Avg of all Queries (msec): " total/count "\n" }' $logfile
awk '$3== "INFO" {print "line:" NR "\t" $0}' $logfile | sort -n -r -k 10 | head -10

          #WAITING
          awk '$3== "INFO" &&$10== "waiting:" {count++;  total=total+$11 } $11== "0)" {nowait++;} END{print "\n\t\t\t*******WAITING:******* \n Total # of Completed Queries: " count "\n        Total Duration (msec): " total "\n    Avg of all Queries (msec): " total/count "\nNumber of Queries NOT Waiting: " nowait "      " (nowait/count)*100 "%\n" }' $logfile
awk '$3== "INFO" {print "line:" NR "\t" $0}' $logfile | sort -n -r -k 12 | head -10
     fi
else
    # logfile does not exist
    printf "\n$logfile does not exist\n"
    printf "Usage: $0 <filename>\n"
fi
}

if [ -z $1 ]; then
        parse "query.log"
  else
        parse $1
fi

运行脚本并传入要分析的 query.log 文件,例如 ./parse_querylog.sh query.log,输出将类似于

  1. dbms.logs.query.time_logging_enabled=false

    First Query Reported at: 2018-04-10 11:48:36.425+0000
     Last Query Reported at: 2018-04-10 11:49:01.213+0000
    
    *******EXECUTION:*******
    Total # of Completed Queries: 6
           Total Duration (msec): 2521
       Avg of all Queries (msec): 420.167
    
    Top 10 Longest Queries longest on top and leading line number:
    Note: Queries which are multi-line will only report first line of query !!!
    
    
    line:1  2018-04-10 11:48:36.425+0000 INFO  2398 ms: embedded-session             - MATCH (a:` Arbitrary label name that really doesn't matter `) RETURN a LIMIT 0 - {} - {}
    line:2  2018-04-10 11:48:37.106+0000 INFO  48 ms: bolt-session  bolt            neo4j-java/dev          client/127.0.0.1:35796  server/127.0.0.1:7687>   - RETURN 1 - {} - {}
    line:3  2018-04-10 11:48:43.839+0000 INFO  33 ms: bolt-session  bolt            neo4j-java/dev          client/127.0.0.1:35796  server/127.0.0.1:7687>   - match (n) return count(n); - {} - {}
    line:9  2018-04-10 11:49:01.213+0000 INFO  21 ms: bolt-session  bolt            neo4j-java/dev          client/127.0.0.1:35796  server/127.0.0.1:7687>   -
    line:4  2018-04-10 11:48:47.494+0000 INFO  12 ms: bolt-session  bolt            neo4j-java/dev          client/127.0.0.1:35796  server/127.0.0.1:7687>   -
    line:6  2018-04-10 11:48:51.613+0000 INFO  9 ms: bolt-session   bolt            neo4j-java/dev          client/127.0.0.1:35796  server/127.0.0.1:7687>   -
  2. dbms.logs.query.time_logging_enabled=true

    注意:为简洁起见,每个章节中 10 条查询的列表已缩减为 2 行。此外,查询的 Cypher 语句也已简化)

    First Query Reported at: 2018-01-24 07:58:13.360+0000
     Last Query Reported at: 2018-01-24 15:19:40.897+0000
    
    *******EXECUTION:*******
    Total # of Completed Queries: 13655
           Total Duration (msec): 1207191
       Avg of all Queries (msec): 88.4065
    
    Top 10 Longest Queries longest on top and leading line number:
    Note: Queries which are multi-line will only report first line of query !!!
    
    line:12100      2018-01-24 08:49:14.144+0000 INFO  10082 ms: (planning: 0, cpu: 81, waiting: 0) - bolt-session  bolt    neo4j   neo4j-java/1.4.2-45c2930bc28fac23dda088b300977b804fedf8bb              client/10.21.1.3:56870    server/10.0.1.2:7687>        neo4j - MATCH (n:Person............)
    line:58883      2018-01-24 12:28:17.472+0000 INFO  2530 ms: (planning: 313, cpu: 2520, waiting: 0) - bolt-session       bolt    neo4j   neo4j-java/1.4.2-45c2930bc28fac23dda088b300977b804fedf8bb              client/10.21.21.27:56870        server/10.0.1.2:7687>        neo4j -  MATCH (n:Person)
    
                    *******PLANNING:*******
     Total # of Completed Queries: 13655
            Total Duration (msec): 8798
        Avg of all Queries (msec): 0.644306
    Number of Queries NOT Planned: 11448    83.8374%
    
    line:37438      2018-01-24 10:43:28.532+0000 INFO  2255 ms: (planning: 991, cpu: 2239, waiting: 0) - bolt-session       bolt    neo4j   neo4j-java/1.4.2-45c2930bc28fac23dda088b300977b804fedf8bb              client/10.21.21.45:56870      server/10.0.1.2:7687>        neo4j -  MATCH (n:Person...............)
    line:89641      2018-01-24 15:13:42.191+0000 INFO  513 ms: (planning: 409, cpu: 510, waiting: 0) - bolt-session bolt    neo4j   neo4j-java/1.4.2-45c2930bc28fac23dda088b300977b804fedf8bb              client/10.21.21.45:56870       server/10.0.1.2:7687>        neo4j - MATCH (n:Person...........) {}
    
                    *******CPU:*******
    Total # of Completed Queries: 13655
           Total Duration (msec): 1194341
       Avg of all Queries (msec): 87.4655
    
    line:58883      2018-01-24 12:28:17.472+0000 INFO  2530 ms: (planning: 313, cpu: 2520, waiting: 0) - bolt-session       bolt    neo4j   neo4j-java/1.4.2-45c2930bc28fac23dda088b300977b804fedf8bb              client/10.21.21.27:56870        server/10.0.1.2:7687>       neo4j -  MATCH (n:Person .......)
    line:386        2018-01-24 07:59:54.851+0000 INFO  2359 ms: (planning: 256, cpu: 2350, waiting: 0) - bolt-session       bolt    neo4j   neo4j-java/1.4.2-45c2930bc28fac23dda088b300977b804fedf8bb              client/10.21.21.27:49536        server/10.0.1.2:7687>        neo4j -  MATCH (n:Person {.......})
    
                            *******WAITING:*******
     Total # of Completed Queries: 13655
            Total Duration (msec): 27
        Avg of all Queries (msec): 0.0019773
    Number of Queries NOT Waiting: 13654      99.9927%
    
    line:81         2018-01-24 07:58:33.168+0000 INFO  67 ms: (planning: 0, cpu: 67, waiting: 27) - bolt-session    bolt    neo4j   neo4j-java/1.4.2-45c2930bc28fac23dda088b300977b804fedf8bb              client/10.21.21.27:49536        server/10.0.1.2:7687>        neo4j -  MATCH (n:Person) return n:Deleted
    line:9991       2018-01-24 08:40:16.476+0000 INFO  74 ms: (planning: 0, cpu: 74, waiting: 0) - bolt-session     bolt    neo4j   neo4j-java/1.4.2-45c2930bc28fac23dda088b300977b804fedf8bb              client/10.21.21.27:49536        server/10.0.1.2:7687>        neo4j -  MATCH (n:Person..............)

从第二次运行的输出可以看到日志跨度为 First Query Reported at: 2018-01-24 07:58:13.360+0000Last Query Reported at: 2018-01-24 15:19:40.897+0000。脚本随后报告我们找到了 Total # of Completed Queries: 13655,并提供了基于总/平均执行时间、总/平均规划时间、总/平均 CPU 时间以及总/平均等待时间的前 10 条最耗时查询列表。此外,从输出可以看到“查询规划”并不是一个重要问题,因为 Number of Queries NOT Planned: 11448 83.8374% 表明 83.8374% 的查询是从查询计划缓存中获取的。最后,“锁定”也不是问题,因为 Number of Queries NOT Waiting: 13654 99.9927% 表示 99.9927% 的查询没有花费时间在“等待”状态。

脚本有一些‘注意事项’。
  • 上述脚本会报告日志文件中满足任意‘前 10’结果的查询所在的行号。然而,如果查询是多行的(包括换行符),则仅显示查询的第一行。需要在相应的行号处读取 query.log 才能完整了解该查询。

  • 如果在 Neo4j 刚刚冷启动后立即对 query.log 进行分析,报告的时间可能会被夸大。这是因为查询未命中查询计划缓存或页面缓存所致。

  • 可以通过参数 dbms.logs.query.threshold 配置 query.log 只记录运行时间超过 X 的查询。如果将其设为 0,则会记录所有查询;如果设为 2s,则仅记录运行时间超过 2 秒的查询,这会影响上述结果。

© . This site is unofficial and not affiliated with Neo4j, Inc.