使用PageRank过程计算节点的得分数据量过百万卡死/死锁

我首先是用如下查询计算了节点得分

MATCH (n:账号)
WITH collect(n) as nodes
CALL apoc.algo.pageRank(nodes) YIELD node,score
SET node.influenceScore=score

但是上述过程计算在浏览器运行时卡死（此标签下数据量254万），使用连接池执行时爆出死锁的错误。然后我我尝试使用批量处理过程计算，但是结果是一样的卡死而且爆出死锁，另外我查看执行计划时发现好像没有真的执行批量处理操作，

CALL apoc.periodic.iterate('MATCH (n:账号)
WITH collect(n) as nodes RETURN nodes', 'WITH {nodes} as n CALL apoc.algo.pageRank(n) YIELD node,score
SET node.influenceScore=score', {batchSize:10,parallel:true})

使用第一个语句在数据量几十万的时候还勉强可以出来结果，但是数据量上百万的时候执行就不行了，急急急急寻找解决办法，求指点？！另外我是本地测试的数据库，当浏览器执行上述过程时卡死后，刷新或者打开新的页面会有链接不了数据库的情况，是什么原因呢？点击STOP没有响应 TIM截图20180920093138.png

bingo 1楼•7 年前

本来就不支持大量数据

kingofneo4j 2楼•7 年前

Maybe your heap size is not large enough. If you have enough memory size, you can size up this parameter and try. I dont use its API to compute the pagerank, instead i use spark to compute it. Because the way of computing pagerank in neo4j is not what i want, i need to make some modification on it. And if you have lots of data, neo4j will take a very long time to compute it, which is anoying. So i choose spark to implement this modificated algorithm and it will take less time to compute it because spark is a distributed computing system.

crazyyanchao 3楼•7 年前

@kingofneo4j 这个意思就是我可以使用spark在neo4j数据集上计算pagerank得分吗？

kingofneo4j 4楼•7 年前

@crazyyanchao Yes, with spark graphx. But u need to do the following thing:

load the data of neo4j into spark cluster
and then compute it with spark cluster.
And lastly, u just need to write back the data into neo4j.

This is how im doing to compute pagerank

crazyyanchao 5楼•7 年前

@kingofneo4j 如果这样做，那我是不是需要额外花费更多的资源，比如需要先搭建一个spark集群，但是这个集群只用来计算这个PageRank得分是不是有点浪费？能不能直接把spark的图计算框架用在neo4j上面？

graphway 6楼•7 年前

内存不足了。下载最新的ALGO扩展包，然后换一个方法调用PageRank试试：

CALL algo.pageRank(‘账号’, null, {iterations:20, dampingFactor:0.85, write: true,writeProperty:“pagerank”}) YIELD nodes, iterations, loadMillis, computeMillis, writeMillis, dampingFactor, write, writeProperty

这个方法会把计算“账号”类节点上所有关系，并把计算结果写入节点的新属性pagerank。ALGO的PageRank可以支持数十亿节点以上的大图，当然内存需要也会更多，而且调用方参数一不一样。

crazyyanchao 7楼•7 年前

graphway老师的回复需要安装额外的算法包才能正常使用，而且apoc.*的算法包优化没有单独的算法包好，官方推荐使用特别优化的算法包，在这里下载： https://github.com/neo4j-contrib/neo4j-graph-algorithms/tree/3.4#efficient-graph-algorithms-for-neo4j 另外使用优化的算法包计算几百万节点也不会很慢