报价欺诈

1. 简介

保险报价欺诈是指在获取保险报价的过程中提供虚假或误导性信息的欺骗行为。从事此类欺诈活动的个人或组织会蓄意操纵个人详细信息、资产或理赔记录等数据，以获取更低的保险费率。

“研究显示，半数英国消费者认为‘撒点小谎’没关系”

— LexisNexis

通过对其处境进行虚假陈述，他们旨在欺骗保险公司，使其提供比正常资质更优惠的费率或承保范围。保险报价欺诈不仅欺骗了保险公司，还可能因保费上涨而影响到其他投保人。保险公司采取数据验证和交叉核对等多种措施来检测并防止此类欺诈。

2. 应用场景

保险报价欺诈是全球保险公司面临的一个重大商业问题。根据行业报告，欺诈活动每年给保险业造成数十亿美元的损失。保险信息协会 (Insurance Information Institute) 的一项最新研究显示，约 10-20% 的保险理赔涉及欺诈，而报价阶段正是欺诈可能发生的初始环节。其后果影响深远，不仅冲击了保险公司的盈利能力，增加了诚信投保人的保费负担，还削弱了行业信任度。检测和预防保险报价欺诈已成为保险公司的首要任务，促使他们采用先进的技术和数据分析方法来缓解这一普遍存在的问题。

3. 解决方案

在打击保险报价欺诈方面，企业正转向利用先进技术寻求有效解决方案。Neo4j 就是其中之一，这是一种提供强大数据建模和分析能力的图数据库。通过利用 Neo4j，保险公司能够连接并分析数据中复杂的关联，从而发现模式、检测欺诈网络并增强欺诈检测算法。Neo4j 的基于图的方法使保险公司能够高效识别欺诈活动、降低风险，并提高打击保险报价欺诈的整体运营效率。

3.1. 图数据库如何提供帮助？

实时欺诈检测：Neo4j 的实时数据处理能力有助于保险公司通过快速识别报价、保单和理赔中的异常情况和可疑模式，来检测并预防欺诈。
图数据建模：Neo4j 通过将数据建模为图，能够识别投保人、理赔、代理人和欺诈指标等实体之间隐藏的关系和模式，从而帮助保险公司更准确地检测和预防欺诈。
网络分析：Neo4j 的图算法和遍历能力可以帮助保险公司识别涉及多个保单、索赔人或代理人的欺诈网络和模式。

4. 建模

本节将展示示例图上的 Cypher 查询示例。目的是说明查询的结构，并提供关于如何在实际环境中构建数据的指南。我们将在一个包含少量节点的图上进行演示。示例图将基于以下数据模型：

4.1. 数据模型

4.1.1 必填字段

以下是开始所需的字段

Quote（报价）节点

firstname：包含申请人的名字
surname：包含申请人的姓氏
dob：包含申请人的出生日期
postcode：包含申请人的邮政编码
passport：包含申请人的护照号码
change_date：提交报价或申请的日期时间

在报价/申请过程中，您可以向该节点添加属性以监控任何您想监控的内容。在我的数据模型和测试数据中，您可能会注意到一个 change_info 属性。请注意，该属性仅用于演示目的，旨在更直观地展示自上次报价以来所做的任何更改。

NEXT_QUOTE 关系

diff_seconds：这是上次报价与当前报价之间的时间差（以秒为单位）。

4.2. 演示数据

以下 Cypher 语句将在 Neo4j 数据库中创建示例图：

// Create quote nodes
CREATE (q1:Quote {firstname: "Micheal", surname: "Down", dob: date("1988-02-02"), postcode: "YO30 7DW", longitude: -1.0927426, latitude: 53.96372145, passport: 584699531, created_date: datetime()-duration({years: 1, months: 1, minutes: 9}), change_info: "first quote"})
CREATE (q2:Quote {firstname: "Michael", surname: "Down", dob: date("1988-02-02"), postcode: "YO30 7DW", longitude: -1.0927426, latitude: 53.96372145, passport: 584699531, created_date: datetime()-duration({years: 1, months: 1, minutes: 4}), change_info: "name change ea to ae"})
CREATE (q3:Quote {firstname: "Michael", surname: "Down", dob: date("1988-02-02"), postcode: "YO30 7DW", longitude: -1.0927426, latitude: 53.96372145, passport: 584699531, created_date: datetime()-duration({years: 1, months: 1, minutes: 3}), change_info: "postcode_change"})
CREATE (q4:Quote {firstname: "Michael", surname: "Down", dob: date("1988-02-02"), postcode: "PA62 6AA", longitude: -5.851487, latitude: 56.359258, passport: 584699530, created_date: datetime()-duration({years: 1, months: 1}), change_info: "passport number"})
CREATE (q5:Quote {firstname: "Michael", surname: "Down", dob: date("1988-02-02"), postcode: "PA62 6AA", longitude: -5.851487, latitude: 56.359258, passport: 584699530, created_date: datetime()-duration({months: 1}), change_info: "quote 1yr later"})
CREATE (q6:Quote {firstname: "Michael", surname: "Down", dob: date("1988-02-02"), postcode: "PA62 6AA", longitude: -5.851487, latitude: 56.359258, passport: 584699530, created_date: datetime(), change_info: "quote 1m later"})


// Create all relationships
CREATE (q1)-[:NEXT_QUOTE {diff_seconds: duration.inSeconds(q1.created_date, q2.created_date).seconds}]->(q2)
CREATE (q2)-[:NEXT_QUOTE {diff_seconds: duration.inSeconds(q2.created_date, q3.created_date).seconds}]->(q3)
CREATE (q3)-[:NEXT_QUOTE {diff_seconds: duration.inSeconds(q3.created_date, q4.created_date).seconds}]->(q4)
CREATE (q4)-[:NEXT_QUOTE {diff_seconds: duration.inSeconds(q4.created_date, q5.created_date).seconds}]->(q5)
CREATE (q5)-[:NEXT_QUOTE {diff_seconds: duration.inSeconds(q5.created_date, q6.created_date).seconds}]->(q6)

4.3. Neo4j 方案

如果调用

// Show neo4j scheme
CALL db.schema.visualization()

您将看到以下响应

5. Cypher 查询

5.1. 查看链中的所有报价

在此查询中，我们将根据以下要求识别报价链：

一个报价连接到另一个报价

// View all quotes
MATCH path=()-[r:NEXT_QUOTE]->()
RETURN path;

5.2. 根据时间差拆分 `Quote` 链

在报价领域，报价之间的时间间隔是一个非常重要的因素。请设想以下场景：

购买汽车保险：通常，汽车保险需要每年购买，保单期限为 12 个月。因此，对比去年的报价与今年的新报价时，可能会有明显差异。

无索赔奖励 (No claims bonus) - （理想情况下）比上一年增加 1 年。
车龄 - 会比上一年大 1 岁。
里程 - 我们预期这会增加。根据个人的年龄、职业、住址等因素，情况会有所不同……

为了识别报价中的差异，我们应该将其划分为较小的时间间隔，就像网页会话一样。

在此查询中，我们将根据以下要求识别报价链：

所有报价均发生在彼此之后的 3600 秒（或 1 小时）内

// Split Quote Chain
MATCH path=()-[rel:NEXT_QUOTE]->()
WHERE rel.diff_seconds < 3600
RETURN path;

此查询的问题在于，当以表格格式查看时，它会显示 Started streaming 3 records（开始流式传输 3 条记录）的消息。本质上，Neo4j 返回了 3 条符合路径条件的独立记录并发送给浏览器进行显示。虽然这在视觉上可能看起来不错，但在分析整个路径时却会造成问题。这将在下一个查询中解决。

insurance quote fraud data stream 3 records

5.3. 单条 `Quote` 路径记录

这是对上一个 Cypher 查询的升级版。它具有高级模式匹配功能，并确保仅返回一条记录。它保持了与前一版本相同的特性。

单条链
所有报价均在彼此 1 小时内发生
所有报价均发生在最近 1000 天内
返回 1 条记录以供进一步分析

MATCH path=(firstQ)-[r:NEXT_QUOTE*..1000]->(lastQ)
WHERE

    // Path termination condition (first)
    (not exists{ (firstQ)<-[:NEXT_QUOTE]-() } or exists{ (firstQ)<-[x:NEXT_QUOTE]-() where x.diff_seconds >= 3600 } )
    AND

    // Path termination condition (last)
    (not exists{ (lastQ)-[:NEXT_QUOTE]->() } or exists{ (lastQ)-[x:NEXT_QUOTE]->() where x.diff_seconds >= 3600 } )
    AND

    // No gaps condition (if you remove this condition then gaps are allowed and you get spurious longer chains that verify the end of path but not the max diff condition)
    all(x in relationships(path) where x.diff_seconds < 3600 )
    AND

    // Filter based on quote in the last N days
    firstQ.created_date > datetime() - Duration({days: 1000})
    AND

    // Where there are more than one quote in the chain otherwise there is nothing to compare against
    length(path)> 1

RETURN path

您现在可以再次查看表格视图，可以看到仅返回了单条记录。

insurance quote fraud data stream 1 record

5.4. 创建带分数的 `SIMILARITY` 关系

为了对报价进行评分，我们必须建立一个连接，整合所有报价属性，以便进行单独评估和整体评估。

在此查询中，我们将根据以下要求识别报价链：

获取最近 1000 天内所有 Quote 节点的完整 Quote 链
获取每个报价之间时间差不超过 1 小时的所有 Quote 节点的完整 Quote 链
计算属性得分
为 Quote 链写入新的 SIMILARITY 关系

// Create Similarity Relationship
MATCH path=(firstQ)-[r:NEXT_QUOTE*..1000]->(lastQ)
WHERE

    // Path termination condition (first)
    (NOT EXISTS{ (firstQ)<-[:NEXT_QUOTE]-() } OR EXISTS{ (firstQ)<-[x:NEXT_QUOTE]-() WHERE x.diff_seconds >= 3600 } )
    AND

    // Path termination condition (last)
    (NOT EXISTS{ (lastQ)-[:NEXT_QUOTE]->() } OR EXISTS{ (lastQ)-[x:NEXT_QUOTE]->() WHERE x.diff_seconds >= 3600 } )
    AND

    // No gaps condition (if you remove this condition then gaps are allowed and you get spurious longer chains that verify the end of path but not the max diff condition)
    ALL(x IN relationships(path) WHERE x.diff_seconds < 3600 )
    AND

    // Filter based on quote in the last N days
    firstQ.created_date > datetime() - duration({days: 1000})
    AND

    // Where there are more than one quote in the chain otherwise there is nothing to compare against
    length(path)> 1

WITH nodes(path) as nodes

// Iterate over the list in chain order we create an array [0,1,2,3... length - 2]
UNWIND range(0,size(nodes)-2) as index

// For each position (index) in the list take the node at that position (current) and the rest
WITH nodes[index] as current, nodes[index+1..size(nodes)] as rest

// Iterate over the rest keeping current to get all pairs of nodes without repetitions
UNWIND rest as subsequent

WITH current, subsequent,

// Build up similarity scores for all properties
// Strings
apoc.text.levenshteinSimilarity(current.firstname, subsequent.firstname) AS firstname,
apoc.text.levenshteinSimilarity(current.surname, subsequent.surname) AS surname,
apoc.text.levenshteinSimilarity(current.postcode, subsequent.postcode) AS postcode,

// Numbers
(current.passport - subsequent.passport) AS passport_number,
apoc.text.levenshteinSimilarity(toString(current.passport), toString(subsequent.passport)) AS passport_similarity,

// Dates
duration.inDays(current.dob, subsequent.dob).days AS dob,

// Location
toInteger(point.distance(point({longitude: current.longitude, latitude: current.latitude}), point({longitude: subsequent.longitude, latitude: subsequent.latitude}))) AS location

// Create :SIMILARITY Relationship
CREATE (current)-[:SIMILARITY {
    // Add change string for simplicity
    change: subsequent.change_info,

    // Strings
    firstname: firstname,
    surname: surname,
    postcode: postcode,

    // Numbers
    passport_number: passport_number,
    passport_similarity: passport_similarity,

    // Dates
    dob: dob,

    // Location
    location: location,

    // Calulcated Similarity Score
    similarity_score: (firstname + surname + postcode + passport_similarity ) / 4
}]->(subsequent)

查看新创建的关系

// View all SIMILARITY relationships
MATCH path=()-[r:SIMILARITY]->()
RETURN path;

5.5. 静态评分

在此查询中，我们将根据以下要求识别报价链：

根据 5.4 节查询中 SIMILARITY 关系计算返回给用户的分数。

// Calculate static Fraud Score
MATCH path=(a)-[r:SIMILARITY]->(b)
WHERE a.created_date > datetime() - Duration({days: 1000})
RETURN sum(r.similarity_score)/COUNT(relationships(path)) AS Similarity,
CASE
    WHEN COUNT(relationships(p)) = 0 THEN 'Additional Quote Needs Adding'
    WHEN toInteger(sum(r.similarity_score)/COUNT(relationships(path)) * 100) > 70 THEN 'LOW'
    WHEN toInteger(sum(r.similarity_score)/COUNT(relationships(path)) * 100) < 70 AND toInteger(sum(r.similarity_score)/COUNT(relationships(path)) * 100) > 50 THEN 'MEDIUM'
    WHEN toInteger(sum(r.similarity_score)/COUNT(relationships(path)) * 100) < 50 THEN 'HIGH'
END AS Fraud_Level

5.6. 实时欺诈评分

对于最后一个 Cypher 查询，我们将向 Neo4j 添加一个新报价，并运行欺诈评分计算，以获得显示相似度分数的实时响应。此代码可用于 API 后端或直接在 Cypher 中使用，从而提供实时的欺诈预警。

在此查询中，我们将根据以下要求识别报价链：

获取最后一条报价
创建一个附加到链末尾的新 Quote
获取最近 1000 天内所有 Quote 节点的完整 Quote 链
获取每个报价之间时间差不超过 1 小时的所有 Quote 节点的完整 Quote 链
计算属性得分
为 Quote 链写入新的 SIMILARITY 关系
计算分数并返回给用户

// // // Realtime Quote Score // // //

// Get last `Quote` node in quote chain
MATCH (last:Quote)
WITH last
ORDER BY last.created_date DESC
LIMIT 1
WITH last
// Create new quote node
MERGE (current:Quote {
    change_info: "changed dob",
    created_date: datetime(),
    dob: Date("1978-11-30"),
    firstname: "Michael",
    surname: "Down",
    latitude: 56.359258,
    longitude: -5.851487,
    passport: 584699530,
    postcode: "PA62 6AA"
})
WITH last, current, duration.inSeconds(DateTime(last.created_date), DateTime(current.created_date)) AS time
// Create relationship
CREATE (last)-[:NEXT_QUOTE {diff_seconds: time.seconds}]->(current)

WITH current

// Minimum comparison
MATCH path=(firstQ)-[r:NEXT_QUOTE*0..100]->(current)
WHERE

    // Path termination condition (first)
    (NOT EXISTS{ (firstQ)<-[:NEXT_QUOTE]-() } OR EXISTS{ (firstQ)<-[x:NEXT_QUOTE]-() WHERE x.diff_seconds >= 3600 } )
    AND

    // Path termination condition (last)
    (NOT EXISTS{ (lastQ)-[:NEXT_QUOTE]->() } OR EXISTS{ (lastQ)-[x:NEXT_QUOTE]->() WHERE x.diff_seconds >= 3600 } )
    AND

    // No gaps condition (if you remove this condition then gaps are allowed and you get spurious longer chains that verify the end of path but not the max diff condition)
    ALL(x IN relationships(path) WHERE x.diff_seconds < 3600 )
    AND

    // Filter based on quote in the last N days
    firstQ.created_date > datetime() - duration({days: 1000})
    AND

    // Where there are more than one quote in the chain otherwise there is nothing to compare against
    length(path)> 1

//let's keep just the nodes in the chain
UNWIND nodes(path)[0..-1] as subsequent

WITH current, subsequent,

// Build up similarity scores for all properties
// Strings
apoc.text.levenshteinSimilarity(current.firstname, subsequent.firstname) AS firstname,
apoc.text.levenshteinSimilarity(current.surname, subsequent.surname) AS surname,
apoc.text.levenshteinSimilarity(current.postcode, subsequent.postcode) AS postcode,

// Numbers
(current.passport - subsequent.passport) AS passport_number,
apoc.text.levenshteinSimilarity(toString(current.passport), toString(subsequent.passport)) AS passport_similarity,

// Dates
duration.inDays(current.dob, subsequent.dob).days AS dob,

// Location
toInteger(point.distance(point({longitude: current.longitude, latitude: current.latitude}), point({longitude: subsequent.longitude, latitude: subsequent.latitude}))) AS location

// Create :SIMILARITY Relationship
CREATE (current)-[:SIMILARITY {
    // Add change string for simplicity
    change: subsequent.change_info,

    // Strings
    firstname: firstname,
    surname: surname,
    postcode: postcode,

    // Numbers
    passport_number: passport_number,
    passport_similarity: passport_similarity,

    // Dates
    dob: dob,

    // Location
    location: location,

    // Calulcated Similarity Score
    similarity_score: (firstname + surname + postcode + passport_similarity ) / 4
}]->(subsequent)

WITH *

// Quote - 3 - Calculate Fraud Score
MATCH p=(a)-[r:SIMILARITY]->(b)
WHERE a.created_date > datetime() - Duration({days: 1000})
RETURN avg(r.similarity_score) AS Similarity,
CASE
    WHEN COUNT(relationships(p)) = 0 THEN 'Run Agiain'
    WHEN toInteger(sum(r.similarity_score)/COUNT(relationships(p)) * 100) > 70 THEN 'LOW'
    WHEN toInteger(sum(r.similarity_score)/COUNT(relationships(p)) * 100) < 70 AND toInteger(sum(r.similarity_score)/COUNT(relationships(p)) * 100) > 50 THEN 'MEDIUM'
    WHEN toInteger(sum(r.similarity_score)/COUNT(relationships(p)) * 100) < 50 THEN 'HIGH'
END AS Fraud_Level;

报价欺诈

1. 简介

2. 应用场景

3. 解决方案

3.1. 图数据库如何提供帮助？

4. 建模

4.1. 数据模型

4.1.1 必填字段

4.2. 演示数据

4.3. Neo4j 方案

5. Cypher 查询

5.1. 查看链中的所有报价

5.2. 根据时间差拆分 Quote 链

5.3. 单条 Quote 路径记录

5.4. 创建带分数的 SIMILARITY 关系

5.5. 静态评分

5.6. 实时欺诈评分

5.2. 根据时间差拆分 `Quote` 链

5.3. 单条 `Quote` 路径记录

5.4. 创建带分数的 `SIMILARITY` 关系