多组学数据集成

单组学整合的基础上,我们通过将其他组学数据类型(蛋白质组学代谢组学)和本体信息直接连接到知识图谱中,进一步增强了生物学洞察力。通过关联标准化的生物医学本体——如基因本体(GO)、疾病本体(MONDO)以及表型(EFO, HPO),多组学图谱得到了经过精心策划的知识层的丰富。图谱中的每个基因、蛋白质、代谢物或生物实体都可以用相关的本体术语进行注释,从而不仅实现了数据整合,还实现了深度的语义富集。

这种以本体驱动的方法允许研究人员以概念驱动的方式查询分子数据:例如,查找参与特定生物过程(GO 术语)的所有基因、被注释为药物靶点的所有蛋白质,或跨实验与某一疾病类别相关的所有分子实体。通过将组学数据与本体节点及关系链接起来,知识图谱支持跨组学推理(例如,影响疾病相关通路转录本的变化),利用层次化词汇结构进行推断,并实现跨数据集和领域的联邦分析。其结果是一个真正互联的数据与知识景观——加速了假设生成、转化发现和精准医学的发展。

场景

单组学基础之上,我们现在扩展了整合范围,以支持蛋白质组学和代谢组学等多个组学层。通过映射基因、其蛋白质产物和代谢通路之间的关系,知识图谱捕捉了从转录组学到蛋白质组学再到代谢组学背景的全谱系。这种统一的表示使研究人员能够追踪不同组学类型间的分子变化,并获得对生物系统和疾病机制的整体洞察。我们还将向节点和关系中添加本体信息,以利用经过策划的知识层来丰富图谱。

数据模型

此数据模型扩展了单组学数据整合页面,增加了蛋白质组学和代谢组学等更多组学数据,以及来自 GO 和 EFO 等本体的数据。

  • 新增蛋白质节点

  • 新增本体节点

  • 新增本体关系

  • 新增蛋白质组学实验

  • 新增代谢组学实验

  • 新增多组学比较 (COMP009)

(Experiment)-[:HAS_VALUE]→(Protein) 关系现在包含:

  • logFC:蛋白质丰度变化

  • pValue:统计显著性

  • regulated:上调/下调方向

  • intensity:质谱 (MS) 强度值

  • peptides:鉴定出的肽段数量

  • coverage:蛋白质序列覆盖度百分比

多组学比较 (COMP009)

一项整合了 NAFLD(非酒精性脂肪肝)转录组学和蛋白质组学数据的新比较,包括:

  • RNA-seq (EXP001)

  • 蛋白质组学 TMT-MS (EXP004)

  • 蛋白质组学 DIA-PASEF (EXP007)

  • 显示 mRNA 与蛋白质一致性的相关系数值

Industry Use Cases multi omics with ontologies

演示数据

以下 Cypher 语句将在 Neo4j 数据库中创建示例图谱

// MERGE Projects
MERGE (p1:Project {sid: "PROJ001", name: "Liver Disease Study"})
MERGE (p2:Project {sid: "PROJ002", name: "Diabetes Research"})
MERGE (p3:Project {sid: "PROJ003", name: "Cross-Disease Metabolic Study"})

// MERGE Tissues
MERGE (t1:Tissue {sid: "UBERON:0002107", name: "Liver"})
MERGE (t2:Tissue {sid: "UBERON:0001264", name: "Pancreas"})
MERGE (t3:Tissue {sid: "UBERON:0000945", name: "Adipose tissue"})

// MERGE Diseases
MERGE (d1:Disease {sid: "MONDO:0005359", name: "Non-alcoholic fatty liver disease"})
MERGE (d2:Disease {sid: "MONDO:0005015", name: "Type 2 Diabetes"})
MERGE (d3:Disease {sid: "MONDO:0011382", name: "Metabolic Syndrome"})

// MERGE Phenotypes (EFO)
MERGE (ph1:EFO {sid: "EFO:0004220", name: "Insulin resistance"})
MERGE (ph2:EFO {sid: "EFO:0001421", name: "Elevated triglycerides"})
MERGE (ph3:EFO {sid: "EFO:0004465", name: "Hepatic steatosis"})
MERGE (ph4:EFO {sid: "EFO:0000685", name: "Obesity"})

// MERGE Samples
MERGE (s1:Sample {sid: "SAMPLE001", name: "Patient_001_Liver", condition: "NAFLD"})
MERGE (s2:Sample {sid: "SAMPLE002", name: "Control_001_Liver", condition: "Healthy"})
MERGE (s3:Sample {sid: "SAMPLE003", name: "Patient_002_Pancreas", condition: "T2D"})
MERGE (s4:Sample {sid: "SAMPLE004", name: "Control_002_Pancreas", condition: "Healthy"})
MERGE (s5:Sample {sid: "SAMPLE005", name: "Patient_003_Liver", condition: "NAFLD"})
MERGE (s6:Sample {sid: "SAMPLE006", name: "Patient_004_Adipose", condition: "MetSyn"})
MERGE (s7:Sample {sid: "SAMPLE007", name: "Control_003_Adipose", condition: "Healthy"})

// MERGE Experiments - RNA-seq
MERGE (e1:Experiment {sid: "EXP001", type: "RNA-seq", platform: "Illumina NovaSeq"})
MERGE (e2:Experiment {sid: "EXP002", type: "RNA-seq", platform: "Illumina NovaSeq"})
MERGE (e3:Experiment {sid: "EXP003", type: "RNA-seq", platform: "Illumina NovaSeq"})

// MERGE Experiments - Proteomics
MERGE (e4:Experiment {sid: "EXP004", type: "Proteomics", platform: "Orbitrap Fusion", method: "TMT-MS"})
MERGE (e5:Experiment {sid: "EXP005", type: "Proteomics", platform: "Orbitrap Fusion", method: "TMT-MS"})
MERGE (e6:Experiment {sid: "EXP006", type: "Proteomics", platform: "Q Exactive HF", method: "Label-free quantification"})
MERGE (e7:Experiment {sid: "EXP007", type: "Proteomics", platform: "timsTOF Pro", method: "DIA-PASEF"})

// ============================================
// EXTENDED COMPARISON NODES
// ============================================

// Basic disease vs control comparisons
MERGE (comp1:Comparison {
  sid: "COMP001",
  name: "NAFLD vs Control (Liver)",
  type: "disease_vs_control",
  tissue: "Liver",
  n_case: 2,
  n_control: 1,
  analysis_date: "2024-01-15"
})

MERGE (comp2:Comparison {
  sid: "COMP002",
  name: "T2D vs Control (Pancreas)",
  type: "disease_vs_control",
  tissue: "Pancreas",
  n_case: 1,
  n_control: 1,
  analysis_date: "2024-01-20"
})

MERGE (comp3:Comparison {
  sid: "COMP003",
  name: "Metabolic Syndrome vs Control (Adipose)",
  type: "disease_vs_control",
  tissue: "Adipose",
  n_case: 1,
  n_control: 1,
  analysis_date: "2024-02-01"
})

// Cross-tissue comparisons (same disease, different tissues)
MERGE (comp4:Comparison {
  sid: "COMP004",
  name: "NAFLD Liver vs T2D Pancreas",
  type: "cross_tissue_disease",
  tissue: "Liver vs Pancreas",
  description: "Compare molecular signatures between NAFLD and T2D",
  analysis_date: "2024-02-10"
})


// Phenotype-based comparison
MERGE (comp6:Comparison {
  sid: "COMP006",
  name: "Insulin Resistant vs Non-Resistant",
  type: "phenotype_stratified",
  stratification: "Insulin resistance status",
  description: "Compare samples with vs without insulin resistance",
  analysis_date: "2024-02-20"
})



// Multi-disease meta-analysis
MERGE (comp8:Comparison {
  sid: "COMP008",
  name: "Pan-Metabolic Disease Signature",
  type: "meta_analysis",
  diseases: "NAFLD, T2D, MetSyn",
  description: "Common molecular signatures across metabolic diseases",
  analysis_date: "2024-03-05"
})

// ============================================
// MERGE GENES with expression data
// ============================================

MERGE (g1:Gene {sid: "ENSG00000105851", symbol: "PIK3CG", name: "Phosphatidylinositol-3-kinase catalytic gamma", source: "Ensembl"})
MERGE (g2:Gene {sid: "ENSG00000169245", symbol: "CXCL10", name: "C-X-C motif chemokine ligand 10", source: "Ensembl"})
MERGE (g3:Gene {sid: "ENSG00000198793", symbol: "MTOR", name: "Mechanistic target of rapamycin kinase", source: "Ensembl"})
MERGE (g4:Gene {sid: "ENSG00000134108", symbol: "AKT1", name: "AKT serine/threonine kinase 1", source: "Ensembl"})
MERGE (g5:Gene {sid: "ENSG00000171408", symbol: "PPARG", name: "Peroxisome proliferator activated receptor gamma", source: "Ensembl"})
MERGE (g6:Gene {sid: "ENSG00000108932", symbol: "CD36", name: "CD36 molecule", source: "Ensembl"})
MERGE (g7:Gene {sid: "ENSG00000163631", symbol: "ALB", name: "Albumin", source: "Ensembl"})
MERGE (g8:Gene {sid: "ENSG00000169429", symbol: "CXCL8", name: "C-X-C motif chemokine ligand 8", source: "Ensembl"})

// MERGE IDs (alternative identifiers)
MERGE (id1:ID {sid: "5294", source: "NCBI"})
MERGE (id2:ID {sid: "3627", source: "NCBI"})
MERGE (id3:ID {sid: "2475", source: "NCBI"})
MERGE (id4:ID {sid: "207", source: "NCBI"})
MERGE (id5:ID {sid: "5468", source: "NCBI"})

// MERGE Proteins
MERGE (pr1:Protein {sid: "P48736", source: "UniProt", name: "PIK3CG", gene_name: "PIK3CG"})
MERGE (pr2:Protein {sid: "P02778", source: "UniProt", name: "CXCL10", gene_name: "CXCL10"})
MERGE (pr3:Protein {sid: "P42345", source: "UniProt", name: "MTOR", gene_name: "MTOR"})
MERGE (pr4:Protein {sid: "P31749", source: "UniProt", name: "AKT1", gene_name: "AKT1"})
MERGE (pr5:Protein {sid: "P37231", source: "UniProt", name: "PPARG", gene_name: "PPARG"})
MERGE (pr6:Protein {sid: "P16671", source: "UniProt", name: "CD36", gene_name: "CD36"})
MERGE (pr7:Protein {sid: "P02768", source: "UniProt", name: "ALB", gene_name: "ALB"})
MERGE (pr8:Protein {sid: "P05067", source: "UniProt", name: "APP", gene_name: "APP"})
MERGE (pr9:Protein {sid: "P01308", source: "UniProt", name: "INS", gene_name: "INS"})
MERGE (pr10:Protein {sid: "P10636", source: "UniProt", name: "MAPT", gene_name: "MAPT"})
MERGE (pr11:Protein {sid: "P35354", source: "UniProt", name: "PTGS2", gene_name: "PTGS2"})
MERGE (pr12:Protein {sid: "P01375", source: "UniProt", name: "TNF", gene_name: "TNF"})

// MERGE GO terms
MERGE (go1:GO {sid: "GO:0005158", name: "insulin receptor binding"})
MERGE (go2:GO {sid: "GO:0006954", name: "inflammatory response"})
MERGE (go3:GO {sid: "GO:0043066", name: "negative regulation of apoptosis"})
MERGE (go4:GO {sid: "GO:0008286", name: "insulin receptor signaling pathway"})
MERGE (go5:GO {sid: "GO:0006629", name: "lipid metabolic process"})
MERGE (go6:GO {sid: "GO:0006955", name: "immune response"})
MERGE (go7:GO {sid: "GO:0071356", name: "cellular response to tumor necrosis factor"})
MERGE (go8:GO {sid: "GO:0042593", name: "glucose homeostasis"})
MERGE (go9:GO {sid: "GO:0006006", name: "glucose metabolic process"})
MERGE (go10:GO {sid: "GO:0030154", name: "cell differentiation"})
MERGE (go11:GO {sid: "GO:0051091", name: "positive regulation of transcription factor activity"})

// MERGE Pathway nodes (higher-level biological pathways)
MERGE (pw1:Pathway {sid: "KEGG:04910", name: "Insulin signaling pathway", source: "KEGG"})
MERGE (pw2:Pathway {sid: "KEGG:04064", name: "NF-kappa B signaling pathway", source: "KEGG"})
MERGE (pw3:Pathway {sid: "KEGG:04151", name: "PI3K-Akt signaling pathway", source: "KEGG"})
MERGE (pw4:Pathway {sid: "KEGG:04150", name: "mTOR signaling pathway", source: "KEGG"})
MERGE (pw5:Pathway {sid: "KEGG:04920", name: "Adipocytokine signaling pathway", source: "KEGG"})
MERGE (pw6:Pathway {sid: "KEGG:03320", name: "PPAR signaling pathway", source: "KEGG"})
MERGE (pw7:Pathway {sid: "REACTOME:R-HSA-74751", name: "Insulin receptor signaling cascade", source: "Reactome"})
MERGE (pw8:Pathway {sid: "REACTOME:R-HSA-449147", name: "Signaling by Interleukins", source: "Reactome"})
MERGE (pw10:Pathway {sid: "WIKIPATHWAYS:WP1471", name: "Inflammatory Response Pathway", source: "WikiPathways"});

// ============================================
// RELATIONSHIPS: PROJECT -> SAMPLE
// ============================================

MATCH (p:Project {sid: "PROJ001"}), (s:Sample {sid: "SAMPLE001"})
MERGE (p)-[:HAS_SAMPLE]->(s);
MATCH (p:Project {sid: "PROJ001"}), (s:Sample {sid: "SAMPLE002"})
MERGE (p)-[:HAS_SAMPLE]->(s);
MATCH (p:Project {sid: "PROJ001"}), (s:Sample {sid: "SAMPLE005"})
MERGE (p)-[:HAS_SAMPLE]->(s);

MATCH (p:Project {sid: "PROJ002"}), (s:Sample {sid: "SAMPLE003"})
MERGE (p)-[:HAS_SAMPLE]->(s);
MATCH (p:Project {sid: "PROJ002"}), (s:Sample {sid: "SAMPLE004"})
MERGE (p)-[:HAS_SAMPLE]->(s);

MATCH (p:Project {sid: "PROJ003"}), (s:Sample {sid: "SAMPLE006"})
MERGE (p)-[:HAS_SAMPLE]->(s);
MATCH (p:Project {sid: "PROJ003"}), (s:Sample {sid: "SAMPLE007"})
MERGE (p)-[:HAS_SAMPLE]->(s);

// ============================================
// RELATIONSHIPS: SAMPLE -> TISSUE
// ============================================

MATCH (s:Sample {sid: "SAMPLE001"}), (t:Tissue {sid: "UBERON:0002107"})
MERGE (s)-[:TAKEN_FROM]->(t);
MATCH (s:Sample {sid: "SAMPLE002"}), (t:Tissue {sid: "UBERON:0002107"})
MERGE (s)-[:TAKEN_FROM]->(t);
MATCH (s:Sample {sid: "SAMPLE005"}), (t:Tissue {sid: "UBERON:0002107"})
MERGE (s)-[:TAKEN_FROM]->(t);

MATCH (s:Sample {sid: "SAMPLE003"}), (t:Tissue {sid: "UBERON:0001264"})
MERGE (s)-[:TAKEN_FROM]->(t);
MATCH (s:Sample {sid: "SAMPLE004"}), (t:Tissue {sid: "UBERON:0001264"})
MERGE (s)-[:TAKEN_FROM]->(t);

MATCH (s:Sample {sid: "SAMPLE006"}), (t:Tissue {sid: "UBERON:0000945"})
MERGE (s)-[:TAKEN_FROM]->(t);
MATCH (s:Sample {sid: "SAMPLE007"}), (t:Tissue {sid: "UBERON:0000945"})
MERGE (s)-[:TAKEN_FROM]->(t);

// ============================================
// RELATIONSHIPS: SAMPLE -> PHENOTYPE
// ============================================

MATCH (s:Sample {sid: "SAMPLE001"}), (ph:EFO {sid: "EFO:0001421"})
MERGE (s)-[:HAS_PHENOTYPE]->(ph);
MATCH (s:Sample {sid: "SAMPLE001"}), (ph:EFO {sid: "EFO:0004465"})
MERGE (s)-[:HAS_PHENOTYPE]->(ph);

MATCH (s:Sample {sid: "SAMPLE003"}), (ph:EFO {sid: "EFO:0004220"})
MERGE (s)-[:HAS_PHENOTYPE]->(ph);

MATCH (s:Sample {sid: "SAMPLE005"}), (ph:EFO {sid: "EFO:0004465"})
MERGE (s)-[:HAS_PHENOTYPE]->(ph);

MATCH (s:Sample {sid: "SAMPLE006"}), (ph:EFO {sid: "EFO:0000685"})
MERGE (s)-[:HAS_PHENOTYPE]->(ph);
MATCH (s:Sample {sid: "SAMPLE006"}), (ph:EFO {sid: "EFO:0004220"})
MERGE (s)-[:HAS_PHENOTYPE]->(ph);

// ============================================
// RELATIONSHIPS: SAMPLE -> EXPERIMENT
// ============================================

MATCH (s:Sample {sid: "SAMPLE001"}), (e:Experiment {sid: "EXP001"})
MERGE (s)-[:HAS_EXPERIMENT]->(e);
MATCH (s:Sample {sid: "SAMPLE002"}), (e:Experiment {sid: "EXP001"})
MERGE (s)-[:HAS_EXPERIMENT]->(e);
MATCH (s:Sample {sid: "SAMPLE005"}), (e:Experiment {sid: "EXP001"})
MERGE (s)-[:HAS_EXPERIMENT]->(e);

MATCH (s:Sample {sid: "SAMPLE003"}), (e:Experiment {sid: "EXP002"})
MERGE (s)-[:HAS_EXPERIMENT]->(e);
MATCH (s:Sample {sid: "SAMPLE004"}), (e:Experiment {sid: "EXP002"})
MERGE (s)-[:HAS_EXPERIMENT]->(e);

MATCH (s:Sample {sid: "SAMPLE006"}), (e:Experiment {sid: "EXP003"})
MERGE (s)-[:HAS_EXPERIMENT]->(e);
MATCH (s:Sample {sid: "SAMPLE007"}), (e:Experiment {sid: "EXP003"})
MERGE (s)-[:HAS_EXPERIMENT]->(e);

// Connect samples to proteomics experiments
MATCH (s:Sample {sid: "SAMPLE001"}), (e:Experiment {sid: "EXP004"})
MERGE (s)-[:HAS_EXPERIMENT]->(e);
MATCH (s:Sample {sid: "SAMPLE002"}), (e:Experiment {sid: "EXP004"})
MERGE (s)-[:HAS_EXPERIMENT]->(e);
MATCH (s:Sample {sid: "SAMPLE005"}), (e:Experiment {sid: "EXP004"})
MERGE (s)-[:HAS_EXPERIMENT]->(e);

MATCH (s:Sample {sid: "SAMPLE003"}), (e:Experiment {sid: "EXP005"})
MERGE (s)-[:HAS_EXPERIMENT]->(e);
MATCH (s:Sample {sid: "SAMPLE004"}), (e:Experiment {sid: "EXP005"})
MERGE (s)-[:HAS_EXPERIMENT]->(e);

MATCH (s:Sample {sid: "SAMPLE006"}), (e:Experiment {sid: "EXP006"})
MERGE (s)-[:HAS_EXPERIMENT]->(e);
MATCH (s:Sample {sid: "SAMPLE007"}), (e:Experiment {sid: "EXP006"})
MERGE (s)-[:HAS_EXPERIMENT]->(e);

// Additional cross-platform experiment
MATCH (s:Sample {sid: "SAMPLE001"}), (e:Experiment {sid: "EXP007"})
MERGE (s)-[:HAS_EXPERIMENT]->(e);
MATCH (s:Sample {sid: "SAMPLE002"}), (e:Experiment {sid: "EXP007"})
MERGE (s)-[:HAS_EXPERIMENT]->(e);

// ============================================
// RELATIONSHIPS: COMPARISON -> EXPERIMENT
// ============================================

// Basic comparisons to experiments
MATCH (c:Comparison {sid: "COMP001"}), (e:Experiment {sid: "EXP001"})
MERGE (c)-[:COMPARES]->(e);

MATCH (c:Comparison {sid: "COMP002"}), (e:Experiment {sid: "EXP002"})
MERGE (c)-[:COMPARES]->(e);

MATCH (c:Comparison {sid: "COMP003"}), (e:Experiment {sid: "EXP003"})
MERGE (c)-[:COMPARES]->(e);

// Cross-tissue comparison
MATCH (c:Comparison {sid: "COMP004"}), (e:Experiment {sid: "EXP001"})
MERGE (c)-[:COMPARES]->(e);
MATCH (c:Comparison {sid: "COMP004"}), (e:Experiment {sid: "EXP002"})
MERGE (c)-[:COMPARES]->(e);

// Proteomics comparisons
MATCH (c:Comparison {sid: "COMP001"}), (e:Experiment {sid: "EXP004"})
MERGE (c)-[:COMPARES]->(e);

MATCH (c:Comparison {sid: "COMP002"}), (e:Experiment {sid: "EXP005"})
MERGE (c)-[:COMPARES]->(e);

MATCH (c:Comparison {sid: "COMP003"}), (e:Experiment {sid: "EXP006"})
MERGE (c)-[:COMPARES]->(e);

// Meta-analysis comparison (all experiments including proteomics)
MATCH (c:Comparison {sid: "COMP008"}), (e:Experiment {sid: "EXP001"})
MERGE (c)-[:COMPARES]->(e);
MATCH (c:Comparison {sid: "COMP008"}), (e:Experiment {sid: "EXP002"})
MERGE (c)-[:COMPARES]->(e);
MATCH (c:Comparison {sid: "COMP008"}), (e:Experiment {sid: "EXP003"})
MERGE (c)-[:COMPARES]->(e);
MATCH (c:Comparison {sid: "COMP008"}), (e:Experiment {sid: "EXP004"})
MERGE (c)-[:COMPARES]->(e);
MATCH (c:Comparison {sid: "COMP008"}), (e:Experiment {sid: "EXP005"})
MERGE (c)-[:COMPARES]->(e);
MATCH (c:Comparison {sid: "COMP008"}), (e:Experiment {sid: "EXP006"})
MERGE (c)-[:COMPARES]->(e);

// Create multi-omics integration comparison
MERGE (comp9:Comparison {
  sid: "COMP009",
  name: "Multi-omics NAFLD Integration",
  type: "multi_omics_integration",
  tissue: "Liver",
  description: "Integrated transcriptomics and proteomics analysis of NAFLD",
  analysis_date: "2024-03-10"
});

MATCH (c:Comparison {sid: "COMP009"}), (e:Experiment {sid: "EXP001"})
MERGE (c)-[:COMPARES]->(e);
MATCH (c:Comparison {sid: "COMP009"}), (e:Experiment {sid: "EXP004"})
MERGE (c)-[:COMPARES]->(e);
MATCH (c:Comparison {sid: "COMP009"}), (e:Experiment {sid: "EXP007"})
MERGE (c)-[:COMPARES]->(e);

MATCH (c:Comparison {sid: "COMP009"}), (d:Disease {sid: "MONDO:0005359"})
MERGE (c)-[:STUDIES_DISEASE]->(d);

// ============================================
// RELATIONSHIPS: COMPARISON -> SAMPLE (Direct)
// ============================================

// COMP001: NAFLD vs Control samples
MATCH (c:Comparison {sid: "COMP001"}), (s:Sample {sid: "SAMPLE001"})
MERGE (c)-[:INCLUDES_CASE]->(s);
MATCH (c:Comparison {sid: "COMP001"}), (s:Sample {sid: "SAMPLE005"})
MERGE (c)-[:INCLUDES_CASE]->(s);
MATCH (c:Comparison {sid: "COMP001"}), (s:Sample {sid: "SAMPLE002"})
MERGE (c)-[:INCLUDES_CONTROL]->(s);

// COMP002: T2D vs Control samples
MATCH (c:Comparison {sid: "COMP002"}), (s:Sample {sid: "SAMPLE003"})
MERGE (c)-[:INCLUDES_CASE]->(s);
MATCH (c:Comparison {sid: "COMP002"}), (s:Sample {sid: "SAMPLE004"})
MERGE (c)-[:INCLUDES_CONTROL]->(s);

// COMP003: MetSyn vs Control samples
MATCH (c:Comparison {sid: "COMP003"}), (s:Sample {sid: "SAMPLE006"})
MERGE (c)-[:INCLUDES_CASE]->(s);
MATCH (c:Comparison {sid: "COMP003"}), (s:Sample {sid: "SAMPLE007"})
MERGE (c)-[:INCLUDES_CONTROL]->(s);

// COMP006: Phenotype-stratified (Insulin Resistant)
MATCH (c:Comparison {sid: "COMP006"}), (s:Sample)-[:HAS_PHENOTYPE]->(ph:EFO {sid: "EFO:0004220"})
MERGE (c)-[:INCLUDES_CASE]->(s);
MATCH (c:Comparison {sid: "COMP006"}), (s:Sample)
WHERE NOT (s)-[:HAS_PHENOTYPE]->(:EFO {sid: "EFO:0004220"})
  AND s.condition = "Healthy"
MERGE (c)-[:INCLUDES_CONTROL]->(s);

// ============================================
// RELATIONSHIPS: COMPARISON -> DISEASE
// ============================================

MATCH (c:Comparison {sid: "COMP001"}), (d:Disease {sid: "MONDO:0005359"})
MERGE (c)-[:STUDIES_DISEASE]->(d);

MATCH (c:Comparison {sid: "COMP002"}), (d:Disease {sid: "MONDO:0005015"})
MERGE (c)-[:STUDIES_DISEASE]->(d);

MATCH (c:Comparison {sid: "COMP003"}), (d:Disease {sid: "MONDO:0011382"})
MERGE (c)-[:STUDIES_DISEASE]->(d);

MATCH (c:Comparison {sid: "COMP004"}), (d:Disease {sid: "MONDO:0005359"})
MERGE (c)-[:STUDIES_DISEASE]->(d);
MATCH (c:Comparison {sid: "COMP004"}), (d:Disease {sid: "MONDO:0005015"})
MERGE (c)-[:STUDIES_DISEASE]->(d);

// Meta-analysis
MATCH (c:Comparison {sid: "COMP008"}), (d:Disease {sid: "MONDO:0005359"})
MERGE (c)-[:STUDIES_DISEASE]->(d);
MATCH (c:Comparison {sid: "COMP008"}), (d:Disease {sid: "MONDO:0005015"})
MERGE (c)-[:STUDIES_DISEASE]->(d);
MATCH (c:Comparison {sid: "COMP008"}), (d:Disease {sid: "MONDO:0011382"})
MERGE (c)-[:STUDIES_DISEASE]->(d);

// ============================================
// RELATIONSHIPS: EXPERIMENT -> GENE (with expression)
// ============================================

// EXP001 - NAFLD signature
MATCH (e:Experiment {sid: "EXP001"}), (g:Gene {symbol: "PIK3CG"})
MERGE (e)-[:HAS_VALUE {logFC: 2.5, pValue: 0.001, regulated: "up", baseMean: 1250.3}]->(g);

MATCH (e:Experiment {sid: "EXP001"}), (g:Gene {symbol: "CXCL10"})
MERGE (e)-[:HAS_VALUE {logFC: 3.2, pValue: 0.0005, regulated: "up", baseMean: 890.5}]->(g);

MATCH (e:Experiment {sid: "EXP001"}), (g:Gene {symbol: "CD36"})
MERGE (e)-[:HAS_VALUE {logFC: 2.8, pValue: 0.0008, regulated: "up", baseMean: 3200.1}]->(g);

MATCH (e:Experiment {sid: "EXP001"}), (g:Gene {symbol: "ALB"})
MERGE (e)-[:HAS_VALUE {logFC: -1.5, pValue: 0.02, regulated: "down", baseMean: 45000.8}]->(g);

// EXP002 - T2D signature
MATCH (e:Experiment {sid: "EXP002"}), (g:Gene {symbol: "MTOR"})
MERGE (e)-[:HAS_VALUE {logFC: -1.8, pValue: 0.01, regulated: "down", baseMean: 1580.2}]->(g);

MATCH (e:Experiment {sid: "EXP002"}), (g:Gene {symbol: "AKT1"})
MERGE (e)-[:HAS_VALUE {logFC: 2.1, pValue: 0.002, regulated: "up", baseMean: 2100.4}]->(g);

MATCH (e:Experiment {sid: "EXP002"}), (g:Gene {symbol: "PPARG"})
MERGE (e)-[:HAS_VALUE {logFC: -2.3, pValue: 0.0003, regulated: "down", baseMean: 980.6}]->(g);

MATCH (e:Experiment {sid: "EXP002"}), (g:Gene {symbol: "CXCL8"})
MERGE (e)-[:HAS_VALUE {logFC: 2.9, pValue: 0.0006, regulated: "up", baseMean: 1340.2}]->(g);

// EXP003 - Metabolic Syndrome signature
MATCH (e:Experiment {sid: "EXP003"}), (g:Gene {symbol: "CD36"})
MERGE (e)-[:HAS_VALUE {logFC: 3.5, pValue: 0.0001, regulated: "up", baseMean: 2890.7}]->(g);

MATCH (e:Experiment {sid: "EXP003"}), (g:Gene {symbol: "PIK3CG"})
MERGE (e)-[:HAS_VALUE {logFC: 1.9, pValue: 0.004, regulated: "up", baseMean: 1100.3}]->(g);

// ============================================
// RELATIONSHIPS: EXPERIMENT -> PROTEIN (Proteomics data)
// ============================================

// EXP004 - NAFLD Proteomics (TMT-MS)
MATCH (e:Experiment {sid: "EXP004"}), (p:Protein {sid: "P48736"})
MERGE (e)-[:HAS_VALUE {
  logFC: -2.3,
  pValue: 0.002,
  regulated: "down",
  intensity: 2.5e7,
  peptides: 12,
  coverage: 45.2
}]->(p);

MATCH (e:Experiment {sid: "EXP004"}), (p:Protein {sid: "P16671"})
MERGE (e)-[:HAS_VALUE {
  logFC: 3.1,
  pValue: 0.0003,
  regulated: "up",
  intensity: 5.8e7,
  peptides: 18,
  coverage: 62.1
}]->(p);

MATCH (e:Experiment {sid: "EXP004"}), (p:Protein {sid: "P02778"})
MERGE (e)-[:HAS_VALUE {
  logFC: 2.8,
  pValue: 0.0008,
  regulated: "up",
  intensity: 1.2e6,
  peptides: 8,
  coverage: 38.5
}]->(p);

MATCH (e:Experiment {sid: "EXP004"}), (p:Protein {sid: "P02768"})
MERGE (e)-[:HAS_VALUE {
  logFC: -1.6,
  pValue: 0.015,
  regulated: "down",
  intensity: 8.9e8,
  peptides: 35,
  coverage: 72.3
}]->(p);

MATCH (e:Experiment {sid: "EXP004"}), (p:Protein {sid: "P01375"})
MERGE (e)-[:HAS_VALUE {
  logFC: 2.5,
  pValue: 0.001,
  regulated: "up",
  intensity: 3.2e6,
  peptides: 6,
  coverage: 42.1
}]->(p);

MATCH (e:Experiment {sid: "EXP004"}), (p:Protein {sid: "P35354"})
MERGE (e)-[:HAS_VALUE {
  logFC: 3.4,
  pValue: 0.0002,
  regulated: "up",
  intensity: 4.5e6,
  peptides: 14,
  coverage: 56.8
}]->(p);

// EXP005 - T2D Proteomics (TMT-MS)
MATCH (e:Experiment {sid: "EXP005"}), (p:Protein {sid: "P42345"})
MERGE (e)-[:HAS_VALUE {
  logFC: -1.7,
  pValue: 0.012,
  regulated: "down",
  intensity: 1.8e7,
  peptides: 22,
  coverage: 38.2
}]->(p);

MATCH (e:Experiment {sid: "EXP005"}), (p:Protein {sid: "P31749"})
MERGE (e)-[:HAS_VALUE {
  logFC: 2.0,
  pValue: 0.003,
  regulated: "up",
  intensity: 2.1e7,
  peptides: 16,
  coverage: 52.4
}]->(p);

MATCH (e:Experiment {sid: "EXP005"}), (p:Protein {sid: "P37231"})
MERGE (e)-[:HAS_VALUE {
  logFC: -2.1,
  pValue: 0.0005,
  regulated: "down",
  intensity: 8.5e6,
  peptides: 11,
  coverage: 41.3
}]->(p);

MATCH (e:Experiment {sid: "EXP005"}), (p:Protein {sid: "P01308"})
MERGE (e)-[:HAS_VALUE {
  logFC: -3.2,
  pValue: 0.0001,
  regulated: "down",
  intensity: 5.2e5,
  peptides: 4,
  coverage: 48.6
}]->(p);

MATCH (e:Experiment {sid: "EXP005"}), (p:Protein {sid: "P48736"})
MERGE (e)-[:HAS_VALUE {
  logFC: 1.8,
  pValue: 0.008,
  regulated: "up",
  intensity: 1.9e7,
  peptides: 10,
  coverage: 43.7
}]->(p);

// EXP006 - Metabolic Syndrome Proteomics (Label-free)
MATCH (e:Experiment {sid: "EXP006"}), (p:Protein {sid: "P16671"})
MERGE (e)-[:HAS_VALUE {
  logFC: 4.2,
  pValue: 0.00005,
  regulated: "up",
  intensity: 7.8e7,
  peptides: 21,
  coverage: 68.9
}]->(p);

MATCH (e:Experiment {sid: "EXP006"}), (p:Protein {sid: "P48736"})
MERGE (e)-[:HAS_VALUE {
  logFC: 2.1,
  pValue: 0.002,
  regulated: "up",
  intensity: 2.8e7,
  peptides: 13,
  coverage: 47.2
}]->(p);

MATCH (e:Experiment {sid: "EXP006"}), (p:Protein {sid: "P37231"})
MERGE (e)-[:HAS_VALUE {
  logFC: -1.9,
  pValue: 0.006,
  regulated: "down",
  intensity: 6.5e6,
  peptides: 9,
  coverage: 38.1
}]->(p);

MATCH (e:Experiment {sid: "EXP006"}), (p:Protein {sid: "P01375"})
MERGE (e)-[:HAS_VALUE {
  logFC: 3.6,
  pValue: 0.0001,
  regulated: "up",
  intensity: 5.1e6,
  peptides: 7,
  coverage: 51.3
}]->(p);

// EXP007 - NAFLD DIA-PASEF Proteomics
MATCH (e:Experiment {sid: "EXP007"}), (p:Protein {sid: "P48736"})
MERGE (e)-[:HAS_VALUE {
  logFC: 2.4,
  pValue: 0.0015,
  regulated: "up",
  intensity: 3.2e7,
  peptides: 15,
  coverage: 51.8
}]->(p);

MATCH (e:Experiment {sid: "EXP007"}), (p:Protein {sid: "P16671"})
MERGE (e)-[:HAS_VALUE {
  logFC: 2.9,
  pValue: 0.0006,
  regulated: "up",
  intensity: 6.1e7,
  peptides: 19,
  coverage: 64.5
}]->(p);

MATCH (e:Experiment {sid: "EXP007"}), (p:Protein {sid: "P31749"})
MERGE (e)-[:HAS_VALUE {
  logFC: 1.7,
  pValue: 0.01,
  regulated: "up",
  intensity: 1.5e7,
  peptides: 12,
  coverage: 46.3
}]->(p);

MATCH (e:Experiment {sid: "EXP007"}), (p:Protein {sid: "P05067"})
MERGE (e)-[:HAS_VALUE {
  logFC: 1.4,
  pValue: 0.025,
  regulated: "up",
  intensity: 8.2e6,
  peptides: 24,
  coverage: 28.7
}]->(p);

MATCH (e:Experiment {sid: "EXP007"}), (p:Protein {sid: "P10636"})
MERGE (e)-[:HAS_VALUE {
  logFC: 1.6,
  pValue: 0.018,
  regulated: "up",
  intensity: 5.5e6,
  peptides: 11,
  coverage: 35.2
}]->(p);

MATCH (e:Experiment {sid: "EXP007"}), (p:Protein {sid: "P02768"})
MERGE (e)-[:HAS_VALUE {
  logFC: -1.5,
  pValue: 0.02,
  regulated: "down",
  intensity: 9.2e8,
  peptides: 38,
  coverage: 75.6
}]->(p);

// ============================================
// RELATIONSHIPS: COMPARISON -> GENE (DGE Results)
// ============================================

// COMP001 results (transcriptomics)
MATCH (c:Comparison {sid: "COMP001"}), (g:Gene {symbol: "PIK3CG"})
MERGE (c)-[:FOUND_DIFFERENTIAL {
  logFC: 2.5,
  pValue: 0.001,
  adjPValue: 0.015,
  regulated: "up",
  significance: "significant",
  data_type: "transcriptomics"
}]->(g);

MATCH (c:Comparison {sid: "COMP001"}), (g:Gene {symbol: "CXCL10"})
MERGE (c)-[:FOUND_DIFFERENTIAL {
  logFC: 3.2,
  pValue: 0.0005,
  adjPValue: 0.008,
  regulated: "up",
  significance: "significant",
  data_type: "transcriptomics"
}]->(g);

MATCH (c:Comparison {sid: "COMP001"}), (g:Gene {symbol: "CD36"})
MERGE (c)-[:FOUND_DIFFERENTIAL {
  logFC: 2.8,
  pValue: 0.0008,
  adjPValue: 0.012,
  regulated: "up",
  significance: "significant",
  data_type: "transcriptomics"
}]->(g);

// COMP001 results (proteomics)
MATCH (c:Comparison {sid: "COMP001"}), (p:Protein {sid: "P48736"})
MERGE (c)-[:FOUND_DIFFERENTIAL {
  logFC: 2.3,
  pValue: 0.002,
  adjPValue: 0.018,
  regulated: "up",
  significance: "significant",
  data_type: "proteomics"
}]->(p);

MATCH (c:Comparison {sid: "COMP001"}), (p:Protein {sid: "P16671"})
MERGE (c)-[:FOUND_DIFFERENTIAL {
  logFC: 3.1,
  pValue: 0.0003,
  adjPValue: 0.006,
  regulated: "up",
  significance: "significant",
  data_type: "proteomics"
}]->(p);

MATCH (c:Comparison {sid: "COMP001"}), (p:Protein {sid: "P02778"})
MERGE (c)-[:FOUND_DIFFERENTIAL {
  logFC: 2.8,
  pValue: 0.0008,
  adjPValue: 0.012,
  regulated: "up",
  significance: "significant",
  data_type: "proteomics"
}]->(p);

MATCH (c:Comparison {sid: "COMP001"}), (p:Protein {sid: "P01375"})
MERGE (c)-[:FOUND_DIFFERENTIAL {
  logFC: 2.5,
  pValue: 0.001,
  adjPValue: 0.015,
  regulated: "up",
  significance: "significant",
  data_type: "proteomics"
}]->(p);

// COMP002 results (transcriptomics)
MATCH (c:Comparison {sid: "COMP002"}), (g:Gene {symbol: "MTOR"})
MERGE (c)-[:FOUND_DIFFERENTIAL {
  logFC: -1.8,
  pValue: 0.01,
  adjPValue: 0.045,
  regulated: "down",
  significance: "significant",
  data_type: "transcriptomics"
}]->(g);

MATCH (c:Comparison {sid: "COMP002"}), (g:Gene {symbol: "AKT1"})
MERGE (c)-[:FOUND_DIFFERENTIAL {
  logFC: 2.1,
  pValue: 0.002,
  adjPValue: 0.018,
  regulated: "up",
  significance: "significant",
  data_type: "transcriptomics"
}]->(g);

// COMP002 results (proteomics)
MATCH (c:Comparison {sid: "COMP002"}), (p:Protein {sid: "P42345"})
MERGE (c)-[:FOUND_DIFFERENTIAL {
  logFC: -1.7,
  pValue: 0.012,
  adjPValue: 0.048,
  regulated: "down",
  significance: "significant",
  data_type: "proteomics"
}]->(p);

MATCH (c:Comparison {sid: "COMP002"}), (p:Protein {sid: "P31749"})
MERGE (c)-[:FOUND_DIFFERENTIAL {
  logFC: 2.0,
  pValue: 0.003,
  adjPValue: 0.022,
  regulated: "up",
  significance: "significant",
  data_type: "proteomics"
}]->(p);

MATCH (c:Comparison {sid: "COMP002"}), (p:Protein {sid: "P37231"})
MERGE (c)-[:FOUND_DIFFERENTIAL {
  logFC: -2.1,
  pValue: 0.0005,
  adjPValue: 0.008,
  regulated: "down",
  significance: "significant",
  data_type: "proteomics"
}]->(p);

// COMP008 - Meta-analysis (shared signatures across omics)
MATCH (c:Comparison {sid: "COMP008"}), (g:Gene {symbol: "PIK3CG"})
MERGE (c)-[:FOUND_DIFFERENTIAL {
  logFC: 2.2,
  pValue: 0.0001,
  adjPValue: 0.005,
  regulated: "up",
  significance: "significant",
  data_type: "meta_analysis",
  note: "Shared across NAFLD and MetSyn"
}]->(g);

MATCH (c:Comparison {sid: "COMP008"}), (p:Protein {sid: "P48736"})
MERGE (c)-[:FOUND_DIFFERENTIAL {
  logFC: 2.1,
  pValue: 0.0002,
  adjPValue: 0.006,
  regulated: "up",
  significance: "significant",
  data_type: "meta_analysis",
  note: "Consistent protein-level validation"
}]->(p);

// COMP009 - Multi-omics integration
MATCH (c:Comparison {sid: "COMP009"}), (g:Gene {symbol: "PIK3CG"})
MERGE (c)-[:FOUND_DIFFERENTIAL {
  logFC: 2.5,
  pValue: 0.001,
  adjPValue: 0.015,
  regulated: "up",
  significance: "significant",
  data_type: "transcriptomics",
  correlation: 0.85
}]->(g);

MATCH (c:Comparison {sid: "COMP009"}), (p:Protein {sid: "P48736"})
MERGE (c)-[:FOUND_DIFFERENTIAL {
  logFC: 2.35,
  pValue: 0.0018,
  adjPValue: 0.017,
  regulated: "up",
  significance: "significant",
  data_type: "proteomics",
  correlation: 0.85
}]->(p);

MATCH (c:Comparison {sid: "COMP009"}), (g:Gene {symbol: "CD36"})
MERGE (c)-[:FOUND_DIFFERENTIAL {
  logFC: 2.8,
  pValue: 0.0008,
  adjPValue: 0.012,
  regulated: "up",
  significance: "significant",
  data_type: "transcriptomics",
  correlation: 0.92
}]->(g);

MATCH (c:Comparison {sid: "COMP009"}), (p:Protein {sid: "P16671"})
MERGE (c)-[:FOUND_DIFFERENTIAL {
  logFC: 3.0,
  pValue: 0.0005,
  adjPValue: 0.008,
  regulated: "up",
  significance: "significant",
  data_type: "proteomics",
  correlation: 0.92
}]->(p);

// ============================================
// RELATIONSHIPS: GENE -> PROTEIN
// ============================================

MATCH (g:Gene {symbol: "PIK3CG"}), (p:Protein {sid: "P48736"})
MERGE (g)-[:CODES]->(p);
MATCH (g:Gene {symbol: "CXCL10"}), (p:Protein {sid: "P02778"})
MERGE (g)-[:CODES]->(p);
MATCH (g:Gene {symbol: "MTOR"}), (p:Protein {sid: "P42345"})
MERGE (g)-[:CODES]->(p);
MATCH (g:Gene {symbol: "AKT1"}), (p:Protein {sid: "P31749"})
MERGE (g)-[:CODES]->(p);
MATCH (g:Gene {symbol: "PPARG"}), (p:Protein {sid: "P37231"})
MERGE (g)-[:CODES]->(p);
MATCH (g:Gene {symbol: "CD36"}), (p:Protein {sid: "P16671"})
MERGE (g)-[:CODES]->(p);
MATCH (g:Gene {symbol: "ALB"}), (p:Protein {sid: "P02768"})
MERGE (g)-[:CODES]->(p);

// ============================================
// RELATIONSHIPS: GENE -> ID
// ============================================

MATCH (g:Gene {symbol: "PIK3CG"}), (id:ID {sid: "5294"})
MERGE (g)-[:MAPPED]->(id);
MATCH (g:Gene {symbol: "CXCL10"}), (id:ID {sid: "3627"})
MERGE (g)-[:MAPPED]->(id);
MATCH (g:Gene {symbol: "MTOR"}), (id:ID {sid: "2475"})
MERGE (g)-[:MAPPED]->(id);
MATCH (g:Gene {symbol: "AKT1"}), (id:ID {sid: "207"})
MERGE (g)-[:MAPPED]->(id);
MATCH (g:Gene {symbol: "PPARG"}), (id:ID {sid: "5468"})
MERGE (g)-[:MAPPED]->(id);

// ============================================
// RELATIONSHIPS: GENE -> DISEASE
// ============================================

MATCH (g:Gene {symbol: "PIK3CG"}), (d:Disease {sid: "MONDO:0005359"})
MERGE (g)-[:RELATED_TO {source: "DisGeNET", score: 0.75}]->(d);

MATCH (g:Gene {symbol: "PPARG"}), (d:Disease {sid: "MONDO:0005015"})
MERGE (g)-[:RELATED_TO {source: "OpenTargets", score: 0.85}]->(d);

MATCH (g:Gene {symbol: "MTOR"}), (d:Disease {sid: "MONDO:0005015"})
MERGE (g)-[:RELATED_TO {source: "DisGeNET", score: 0.68}]->(d);

MATCH (g:Gene {symbol: "CD36"}), (d:Disease {sid: "MONDO:0011382"})
MERGE (g)-[:RELATED_TO {source: "OpenTargets", score: 0.72}]->(d);

MATCH (g:Gene {symbol: "AKT1"}), (d:Disease {sid: "MONDO:0005015"})
MERGE (g)-[:RELATED_TO {source: "DisGeNET", score: 0.81}]->(d);

// ============================================
// RELATIONSHIPS: PROTEIN -> GO
// ============================================

MATCH (p:Protein {sid: "P48736"}), (go:GO {sid: "GO:0008286"})
MERGE (p)-[:ASSOCIATED_WITH {source: "GO"}]->(go);

MATCH (p:Protein {sid: "P02778"}), (go:GO {sid: "GO:0006954"})
MERGE (p)-[:ASSOCIATED_WITH {source: "GO"}]->(go);

MATCH (p:Protein {sid: "P42345"}), (go:GO {sid: "GO:0008286"})
MERGE (p)-[:ASSOCIATED_WITH {source: "GO"}]->(go);

MATCH (p:Protein {sid: "P31749"}), (go:GO {sid: "GO:0008286"})
MERGE (p)-[:ASSOCIATED_WITH {source: "GO"}]->(go);

MATCH (p:Protein {sid: "P37231"}), (go:GO {sid: "GO:0005158"})
MERGE (p)-[:ASSOCIATED_WITH {source: "GO"}]->(go);

MATCH (p:Protein {sid: "P16671"}), (go:GO {sid: "GO:0006629"})
MERGE (p)-[:ASSOCIATED_WITH {source: "GO"}]->(go);

MATCH (p:Protein {sid: "P02768"}), (go:GO {sid: "GO:0006955"})
MERGE (p)-[:ASSOCIATED_WITH {source: "GO"}]->(go);

// Additional GO term associations for pathway enrichment
MATCH (p:Protein {sid: "P48736"}), (go:GO {sid: "GO:0042593"})
MERGE (p)-[:ASSOCIATED_WITH {source: "GO"}]->(go);

MATCH (p:Protein {sid: "P31749"}), (go:GO {sid: "GO:0042593"})
MERGE (p)-[:ASSOCIATED_WITH {source: "GO"}]->(go);

MATCH (p:Protein {sid: "P42345"}), (go:GO {sid: "GO:0042593"})
MERGE (p)-[:ASSOCIATED_WITH {source: "GO"}]->(go);

MATCH (p:Protein {sid: "P37231"}), (go:GO {sid: "GO:0006629"})
MERGE (p)-[:ASSOCIATED_WITH {source: "GO"}]->(go);

MATCH (p:Protein {sid: "P16671"}), (go:GO {sid: "GO:0006629"})
MERGE (p)-[:ASSOCIATED_WITH {source: "GO"}]->(go);

MATCH (p:Protein {sid: "P01375"}), (go:GO {sid: "GO:0071356"})
MERGE (p)-[:ASSOCIATED_WITH {source: "GO"}]->(go);

MATCH (p:Protein {sid: "P35354"}), (go:GO {sid: "GO:0071356"})
MERGE (p)-[:ASSOCIATED_WITH {source: "GO"}]->(go);

MATCH (p:Protein {sid: "P02778"}), (go:GO {sid: "GO:0071356"})
MERGE (p)-[:ASSOCIATED_WITH {source: "GO"}]->(go);

MATCH (p:Protein {sid: "P01308"}), (go:GO {sid: "GO:0006006"})
MERGE (p)-[:ASSOCIATED_WITH {source: "GO"}]->(go);

MATCH (p:Protein {sid: "P01308"}), (go:GO {sid: "GO:0042593"})
MERGE (p)-[:ASSOCIATED_WITH {source: "GO"}]->(go);

MATCH (p:Protein {sid: "P37231"}), (go:GO {sid: "GO:0030154"})
MERGE (p)-[:ASSOCIATED_WITH {source: "GO"}]->(go);

MATCH (p:Protein {sid: "P37231"}), (go:GO {sid: "GO:0051091"})
MERGE (p)-[:ASSOCIATED_WITH {source: "GO"}]->(go);

// ============================================
// RELATIONSHIPS: GO -> PATHWAY (IS_PART_OF hierarchy)
// ============================================

// Insulin signaling GO terms to Pathways
MATCH (go:GO {sid: "GO:0008286"}), (pw:Pathway {sid: "KEGG:04910"})
MERGE (go)-[:IS_PART_OF]->(pw);

MATCH (go:GO {sid: "GO:0005158"}), (pw:Pathway {sid: "KEGG:04910"})
MERGE (go)-[:IS_PART_OF]->(pw);

MATCH (go:GO {sid: "GO:0042593"}), (pw:Pathway {sid: "KEGG:04910"})
MERGE (go)-[:IS_PART_OF]->(pw);

MATCH (go:GO {sid: "GO:0008286"}), (pw:Pathway {sid: "REACTOME:R-HSA-74751"})
MERGE (go)-[:IS_PART_OF]->(pw);

// PI3K-Akt pathway connections
MATCH (go:GO {sid: "GO:0008286"}), (pw:Pathway {sid: "KEGG:04151"})
MERGE (go)-[:IS_PART_OF]->(pw);

MATCH (go:GO {sid: "GO:0042593"}), (pw:Pathway {sid: "KEGG:04151"})
MERGE (go)-[:IS_PART_OF]->(pw);

MATCH (go:GO {sid: "GO:0043066"}), (pw:Pathway {sid: "KEGG:04151"})
MERGE (go)-[:IS_PART_OF]->(pw);

// mTOR pathway connections
MATCH (go:GO {sid: "GO:0042593"}), (pw:Pathway {sid: "KEGG:04150"})
MERGE (go)-[:IS_PART_OF]->(pw);

// Inflammatory pathway connections
MATCH (go:GO {sid: "GO:0006954"}), (pw:Pathway {sid: "KEGG:04064"})
MERGE (go)-[:IS_PART_OF]->(pw);

MATCH (go:GO {sid: "GO:0071356"}), (pw:Pathway {sid: "KEGG:04064"})
MERGE (go)-[:IS_PART_OF]->(pw);

MATCH (go:GO {sid: "GO:0006954"}), (pw:Pathway {sid: "WIKIPATHWAYS:WP1471"})
MERGE (go)-[:IS_PART_OF]->(pw);

MATCH (go:GO {sid: "GO:0071356"}), (pw:Pathway {sid: "WIKIPATHWAYS:WP1471"})
MERGE (go)-[:IS_PART_OF]->(pw);

MATCH (go:GO {sid: "GO:0006955"}), (pw:Pathway {sid: "WIKIPATHWAYS:WP1471"})
MERGE (go)-[:IS_PART_OF]->(pw);

// Lipid metabolism pathways
MATCH (go:GO {sid: "GO:0006629"}), (pw:Pathway {sid: "KEGG:04920"})
MERGE (go)-[:IS_PART_OF]->(pw);

MATCH (go:GO {sid: "GO:0006629"}), (pw:Pathway {sid: "KEGG:03320"})
MERGE (go)-[:IS_PART_OF]->(pw);

// PPAR signaling
MATCH (go:GO {sid: "GO:0030154"}), (pw:Pathway {sid: "KEGG:03320"})
MERGE (go)-[:IS_PART_OF]->(pw);

MATCH (go:GO {sid: "GO:0051091"}), (pw:Pathway {sid: "KEGG:03320"})
MERGE (go)-[:IS_PART_OF]->(pw);

// Glucose metabolism
MATCH (go:GO {sid: "GO:0006006"}), (pw:Pathway {sid: "KEGG:04910"})
MERGE (go)-[:IS_PART_OF]->(pw);

MATCH (go:GO {sid: "GO:0006006"}), (pw:Pathway {sid: "KEGG:04920"})
MERGE (go)-[:IS_PART_OF]->(pw);

// Interleukin signaling
MATCH (go:GO {sid: "GO:0006954"}), (pw:Pathway {sid: "REACTOME:R-HSA-449147"})
MERGE (go)-[:IS_PART_OF]->(pw);

MATCH (go:GO {sid: "GO:0071356"}), (pw:Pathway {sid: "REACTOME:R-HSA-449147"})
MERGE (go)-[:IS_PART_OF]->(pw);

// ============================================
// RELATIONSHIPS: DISEASE -> PATHWAY (direct disease-pathway associations)
// ============================================

MATCH (d:Disease {sid: "MONDO:0005359"}), (pw:Pathway {sid: "KEGG:04920"})
MERGE (d)-[:INVOLVES_PATHWAY {evidence: "literature"}]->(pw);

MATCH (d:Disease {sid: "MONDO:0005359"}), (pw:Pathway {sid: "KEGG:03320"})
MERGE (d)-[:INVOLVES_PATHWAY {evidence: "literature"}]->(pw);

MATCH (d:Disease {sid: "MONDO:0005359"}), (pw:Pathway {sid: "KEGG:04151"})
MERGE (d)-[:INVOLVES_PATHWAY {evidence: "literature"}]->(pw);

MATCH (d:Disease {sid: "MONDO:0005015"}), (pw:Pathway {sid: "KEGG:04910"})
MERGE (d)-[:INVOLVES_PATHWAY {evidence: "literature"}]->(pw);

MATCH (d:Disease {sid: "MONDO:0005015"}), (pw:Pathway {sid: "KEGG:04151"})
MERGE (d)-[:INVOLVES_PATHWAY {evidence: "literature"}]->(pw);

MATCH (d:Disease {sid: "MONDO:0005015"}), (pw:Pathway {sid: "KEGG:04920"})
MERGE (d)-[:INVOLVES_PATHWAY {evidence: "literature"}]->(pw);

MATCH (d:Disease {sid: "MONDO:0011382"}), (pw:Pathway {sid: "KEGG:04920"})
MERGE (d)-[:INVOLVES_PATHWAY {evidence: "literature"}]->(pw);

MATCH (d:Disease {sid: "MONDO:0011382"}), (pw:Pathway {sid: "KEGG:04910"})
MERGE (d)-[:INVOLVES_PATHWAY {evidence: "literature"}]->(pw);

// ============================================
// RELATIONSHIPS: PROTEIN-PROTEIN INTERACTIONS
// ============================================

MATCH (p1:Protein {sid: "P48736"}), (p2:Protein {sid: "P31749"})
MERGE (p1)-[:INTERACTS_WITH {source: "STRING", score: 0.9}]->(p2);

MATCH (p1:Protein {sid: "P31749"}), (p2:Protein {sid: "P42345"})
MERGE (p1)-[:INTERACTS_WITH {source: "STRING", score: 0.95}]->(p2);

MATCH (p1:Protein {sid: "P31749"}), (p2:Protein {sid: "P37231"})
MERGE (p1)-[:INTERACTS_WITH {source: "STRING", score: 0.78}]->(p2);

MATCH (p1:Protein {sid: "P16671"}), (p2:Protein {sid: "P48736"})
MERGE (p1)-[:INTERACTS_WITH {source: "STRING", score: 0.65}]->(p2);

// Extended PPI network for neighborhood analysis
MATCH (p1:Protein {sid: "P48736"}), (p2:Protein {sid: "P42345"})
MERGE (p1)-[:INTERACTS_WITH {source: "STRING", score: 0.72, evidence: "experimental"}]->(p2);

MATCH (p1:Protein {sid: "P42345"}), (p2:Protein {sid: "P37231"})
MERGE (p1)-[:INTERACTS_WITH {source: "STRING", score: 0.81, evidence: "experimental"}]->(p2);

MATCH (p1:Protein {sid: "P31749"}), (p2:Protein {sid: "P01308"})
MERGE (p1)-[:INTERACTS_WITH {source: "STRING", score: 0.88, evidence: "experimental"}]->(p2);

MATCH (p1:Protein {sid: "P37231"}), (p2:Protein {sid: "P01308"})
MERGE (p1)-[:INTERACTS_WITH {source: "STRING", score: 0.76, evidence: "database"}]->(p2);

MATCH (p1:Protein {sid: "P16671"}), (p2:Protein {sid: "P31749"})
MERGE (p1)-[:INTERACTS_WITH {source: "STRING", score: 0.68, evidence: "experimental"}]->(p2);

MATCH (p1:Protein {sid: "P16671"}), (p2:Protein {sid: "P37231"})
MERGE (p1)-[:INTERACTS_WITH {source: "STRING", score: 0.55, evidence: "co-expression"}]->(p2);

MATCH (p1:Protein {sid: "P02778"}), (p2:Protein {sid: "P01375"})
MERGE (p1)-[:INTERACTS_WITH {source: "STRING", score: 0.82, evidence: "experimental"}]->(p2);

MATCH (p1:Protein {sid: "P01375"}), (p2:Protein {sid: "P35354"})
MERGE (p1)-[:INTERACTS_WITH {source: "STRING", score: 0.91, evidence: "experimental"}]->(p2);

MATCH (p1:Protein {sid: "P02778"}), (p2:Protein {sid: "P35354"})
MERGE (p1)-[:INTERACTS_WITH {source: "STRING", score: 0.73, evidence: "co-expression"}]->(p2);

MATCH (p1:Protein {sid: "P48736"}), (p2:Protein {sid: "P02768"})
MERGE (p1)-[:INTERACTS_WITH {source: "STRING", score: 0.42, evidence: "text-mining"}]->(p2);

MATCH (p1:Protein {sid: "P05067"}), (p2:Protein {sid: "P10636"})
MERGE (p1)-[:INTERACTS_WITH {source: "STRING", score: 0.85, evidence: "experimental"}]->(p2);

MATCH (p1:Protein {sid: "P31749"}), (p2:Protein {sid: "P05067"})
MERGE (p1)-[:INTERACTS_WITH {source: "STRING", score: 0.67, evidence: "database"}]->(p2);

MATCH (p1:Protein {sid: "P42345"}), (p2:Protein {sid: "P05067"})
MERGE (p1)-[:INTERACTS_WITH {source: "STRING", score: 0.71, evidence: "database"}]->(p2);

// Bidirectional interactions (making network undirected)
MATCH (p1:Protein {sid: "P31749"}), (p2:Protein {sid: "P48736"})
MERGE (p2)-[:INTERACTS_WITH {source: "STRING", score: 0.9}]->(p1);

MATCH (p1:Protein {sid: "P42345"}), (p2:Protein {sid: "P31749"})
MERGE (p2)-[:INTERACTS_WITH {source: "STRING", score: 0.95}]->(p1);

MATCH (p1:Protein {sid: "P37231"}), (p2:Protein {sid: "P31749"})
MERGE (p2)-[:INTERACTS_WITH {source: "STRING", score: 0.78}]->(p1);

MATCH (p1:Protein {sid: "P48736"}), (p2:Protein {sid: "P16671"})
MERGE (p2)-[:INTERACTS_WITH {source: "STRING", score: 0.65}]->(p1);

Cypher 查询

以下是一些可用于从现有数据中获取洞察的查询示例。

数据集中的关键示例

这些只是数据集中几个关键示例,以及可用于展示它们的查询语句。

NAFLD 一致性过表达 (COMP001)

  • PIK3CG:mRNA +2.5 FC ↔ 蛋白质 +2.3 FC ✓ 一致

  • CD36:mRNA +2.8 FC ↔ 蛋白质 +3.1 FC ✓ 一致

  • CXCL10:mRNA +3.2 FC ↔ 蛋白质 +2.8 FC ✓ 一致

WITH
  "COMP001" AS sid,
  0 AS logFCThreshold
MATCH (c:Comparison {sid: sid})-[fd1:FOUND_DIFFERENTIAL]->(g:Gene)-[:CODES]->(p:Protein)<-[fd2:FOUND_DIFFERENTIAL]-(c)
RETURN
  g.symbol AS Gene,
  g.name AS GeneName,
  fd1.logFC AS `mRNA Fold Change`,
  fd2.logFC AS `Protein Fold Change`,
  CASE WHEN fd1.logFC > logFCThreshold AND fd2.logFC > logFCThreshold THEN 'Concordant (Overexpression)'
       WHEN fd1.logFC < logFCThreshold AND fd2.logFC < logFCThreshold THEN 'Concordant (Underexpression)'
       ELSE 'Discordant' END AS Regulation;

T2D(二型糖尿病)一致性过表达 (COMP002)

  • AKT1:mRNA +2.1 FC ↔ 蛋白质 +2.0 FC ✓ 一致

WITH
  "COMP002" AS sid,
  0 AS logFCThreshold
MATCH
  (c:Comparison {sid: sid})-[fd1:FOUND_DIFFERENTIAL]->(g:Gene)-[:CODES]->(p:Protein)<-[fd2:FOUND_DIFFERENTIAL]-(c)
WHERE
  fd1.logFC > logFCThreshold AND fd2.logFC > logFCThreshold
RETURN
  g.symbol AS Gene,
  g.name AS GeneName,
  fd1.logFC AS `mRNA Fold Change`,
  fd2.logFC AS `Protein Fold Change`

多组学整合 (COMP009)

  • 展示转录本与蛋白质水平之间的相关性,相关系数 (0.85-0.92)

MATCH (c:Comparison {sid: 'COMP009'})-[fd:FOUND_DIFFERENTIAL]->(p:Protein)
WHERE fd.correlation >= 0.85 AND fd.correlation <= 0.92
RETURN c,fd,p

仅蛋白质组学层面的变化 - 识别转录后调控

查找在没有相应 mRNA 变化的情况下发生显著变化的蛋白质。

// Proteomics-only changes - Identify post-transcriptional regulation
WITH
  "Proteomics" AS experimentType,
  1.5 AS logFCThreshold,
  0.05 AS pValueThreshold
MATCH
  (exp:Experiment {type: experimentType})-[r:HAS_VALUE]->(protein:Protein),
  (protein)<-[:CODES]-(gene:Gene)<-[fd:FOUND_DIFFERENTIAL]-(comp:Comparison)-[:COMPARES]->(exp)
WHERE
  r.pValue < pValueThreshold AND abs(r.logFC) > logFCThreshold
  AND fd.regulated <> r.regulated
RETURN
  protein.name AS Protein,
  gene.symbol AS Gene,
  r.logFC AS logFC,
  r.pValue AS PValue,
  r.regulated AS Direction;

哪些蛋白质/基因与特定疾病相关?

以特定疾病为起点,查找相关的基因和蛋白质,以及它们所涉及的生物学过程。

// Find proteins associated with a given disease (e.g., Non-alcoholic fatty liver disease)
WITH "Non-alcoholic fatty liver disease" AS diseaseName
MATCH (d:Disease {name: diseaseName})<-[:RELATED_TO]-(g:Gene)-[:CODES]->(p:Protein)-[:ASSOCIATED_WITH]->(go:GO)
RETURN
  g.symbol AS Gene,
  p.sid AS Protein,
  go.name AS `Biological Process`
ORDER BY Gene;

哪些蛋白质/基因与特定表型相关?

我们也可以从特定表型出发,寻找相关的基因/蛋白质。

// Find genes from experiments on samples with specific phenotype (e.g., elevated triglycerides)
WITH "Elevated triglycerides" AS phenotypeName
MATCH (ph:EFO {name: phenotypeName})<-[:HAS_PHENOTYPE]-(s:Sample)-[:HAS_EXPERIMENT]->(e:Experiment)-[hv:HAS_VALUE]->(g:Gene)
WHERE hv.pValue < 0.05
RETURN
  g.symbol AS Gene,
  hv.logFC AS logFC,
  hv.pValue AS PValue
ORDER BY hv.pValue;

哪些疾病或表型与给定的基因/蛋白质相关(即靶向该蛋白还可能影响什么)?

以感兴趣的基因或蛋白质为起点,查找与之关联的疾病。

// Find diseases linked to a given gene/protein
WITH "PIK3CG" AS geneSymbol
MATCH (g:Gene {symbol: geneSymbol})-[:RELATED_TO]-(d:Disease)
RETURN d.name AS Disease
ORDER BY Disease;

查找与给定的基因/蛋白质相关的表型(即靶向该蛋白还可能影响什么)?

以一个 Gene 为起点,从测量该基因的实验样本中查找相关表型。

// Find phenotypes linked to a given gene/protein
WITH "PIK3CG" AS geneSymbol
MATCH (g:Gene {symbol: geneSymbol})<-[:HAS_VALUE]-(:Experiment)<-[:HAS_EXPERIMENT]-(:Sample)-[:HAS_PHENOTYPE]->(ph:EFO)
RETURN
  DISTINCT g.symbol AS Gene,
  ph.name AS Phenotype,
  ph.sid AS PhenotypeID;

在特定疾病中,哪些基因在 mRNA(转录本)和蛋白质水平上均过表达?

以疾病(例如非酒精性脂肪肝)为起点,查找在转录本和蛋白质水平上均上调的基因。

// Which genes are overexpressed at both the mRNA (transcript) and protein levels in a given disease?
WITH
  "Non-alcoholic fatty liver disease" AS diseaseName,
  "up" AS regulationDirection
MATCH
  (disease:Disease {name: diseaseName}),
  (comp:Comparison)-[:STUDIES_DISEASE]->(disease),
  (comp)-[fd:FOUND_DIFFERENTIAL]->(gene:Gene)
WHERE fd.regulated = regulationDirection
RETURN
  DISTINCT gene.symbol AS Gene,
  gene.name AS GeneName,
  fd.logFC AS mRNA_logFC,
  fd.adjPValue AS mRNA_AdjPValue;

哪些蛋白质位于候选基因的网络邻域内(例如相互作用蛋白、上游调控因子、下游效应因子)?

以感兴趣的基因(例如 PIK3CG)为起点,查找其直接相互作用的蛋白质,以及位于其上游和下游的蛋白质。

// Find direct interacting proteins for PIK3CG
WITH "PIK3CG" AS targetGeneSymbol
MATCH
  (g:Gene {symbol: targetGeneSymbol})-[:CODES]->(p:Protein)-[i:INTERACTS_WITH]-(neighbor:Protein)<-[:CODES]-(ng:Gene)
RETURN
  g.symbol AS TargetGene,
  ng.symbol AS NeighborGene,
  neighbor.sid AS NeighborProtein,
  i.score AS InteractionScore
ORDER BY i.score DESC;

疾病节点附近未被充分探索但具有潜在成药性的靶点?

此查询寻找距离已知疾病相关节点 1-2 跳(hops)但尚未被注释为药物靶点的蛋白质,从而指示潜在的新靶点。

// Find proteins 1-2 hops away from known disease-associated nodes that aren't annotated to drugs
MATCH path = (d:Disease)<-[:RELATED_TO]-(g:Gene)-[:CODES]->(p:Protein)-[:INTERACTS_WITH*1..2]-(neighbor:Protein)<-[:CODES]-(ng:Gene)
WHERE NOT (p)-[:ASSOCIATED_WITH]->(:Drug)
RETURN
  g.symbol AS TargetGene,
  ng.symbol AS NeighborGene,
  neighbor.sid AS NeighborProtein,
  length(path) AS Distance
ORDER BY Distance;

在疾病的候选靶点中,哪些得到了多种证据类型(遗传学、表达、通路参与、文献等)的最强支持?

以感兴趣的疾病为起点,根据与已知疾病相关基因共享的 GO 通路来查找新的候选基因,并过滤掉已与该疾病关联的基因。

// Find novel candidate genes for Type 2 Diabetes through shared GO pathways
// with known disease-associated genes
WITH "Type 2 Diabetes" AS diseaseName
MATCH
  (disease:Disease {name: diseaseName})<-[:RELATED_TO]-(known:Gene)-[:CODES]->(kp:Protein)-[:ASSOCIATED_WITH]->(go:GO),
  (go)<-[:ASSOCIATED_WITH]-(cp:Protein)<-[:CODES]-(candidate:Gene)
WHERE
  NOT (candidate)-[:RELATED_TO]->(disease)
  AND candidate <> known
WITH
  candidate, disease, count(DISTINCT go) AS sharedPathways, collect(DISTINCT go.name) AS pathways
WHERE sharedPathways >= 1  // Reduced from 2 to 1 due to smaller demo dataset
RETURN
  candidate.symbol AS `Novel Candidate`,
  candidate.name AS `Candidate Name`,
  sharedPathways AS `Shared Pathway Count`,
  pathways AS `Shared Pathways`
ORDER BY `Shared Pathways Count` DESC
LIMIT 10;

哪些候选靶点在疾病相关的组织/细胞类型/生物学背景中表达(或失调)?

在该查询中,我们寻找跨多个实验与已知疾病基因共表达的基因。这是一种识别可能在功能上与疾病过程相关的基因的方法。

// Find genes co-expressed with known disease genes across experiments
WITH
  0.05 AS pValueThreshold,
  2 AS minCoExpressionCount
MATCH (disease:Disease)<-[:RELATED_TO]-(known:Gene)<-[hv1:HAS_VALUE]-(experiment:Experiment)-[hv2:HAS_VALUE]->(candidate:Gene)
WHERE
  NOT (candidate)-[:RELATED_TO]->(disease)
  AND hv1.regulated = hv2.regulated
  AND hv1.pValue < pValueThreshold AND hv2.pValue < pValueThreshold
WITH
  candidate,
  disease,
  count(DISTINCT experiment) AS coExpressionCount
WHERE coExpressionCount >= minCoExpressionCount
RETURN
  candidate.symbol AS NovelCandidate,
  disease.name AS Disease,
  coExpressionCount AS TimesCoExpressed
ORDER BY coExpressionCount DESC;

哪些蛋白质位于候选靶点的网络邻域内?

以感兴趣的基因(例如 PIK3CG)为起点,查找其直接相互作用的蛋白质,以及编码这些蛋白质的基因。

// Which proteins are in the network neighborhood of a candidate target?
WITH "PIK3CG" AS targetGeneSymbol
MATCH
  (candidate:Gene {symbol: targetGeneSymbol})-[:CODES]->(p:Protein)-[i:INTERACTS_WITH]-(neighborProtein:Protein)<-[:CODES]-(neighbourGene:Gene)
RETURN
  candidate.symbol AS `Candidate Gene`,
  neighbourGene.symbol AS `Neighbor Gene`,
  neighborProtein.sid AS `Neighbor Protein`,
  i.score AS `Interaction Score`
ORDER BY `Interaction Score` DESC;

我的 RNA-seq 数据中过表达基因的 1-2 跳网络内有哪些蛋白质-蛋白质相互作用?

我们从 RNA-seq 数据中识别出的特定过表达 Gene(例如 PIK3CG)出发,寻找其 Protein 相互作用,并探索其最多 2 跳以内的相互作用网络,重点关注由其他过表达基因编码的蛋白质,使我们不仅能针对主要基因产物,还能针对其直接网络进行潜在干预。

// What are the protein-protein interactions in 1–2 hops of my overexpressed genes from my RNA-seq?
WITH
  "PIK3CG" AS geneSymbol,
  0 AS logFCThreshold
MATCH
  path=(:Comparison)-[fd1:FOUND_DIFFERENTIAL]->(gene:Gene {symbol: geneSymbol})-[:CODES]->(p:Protein)-[i:INTERACTS_WITH*1..2]-(neighbor:Protein)<-[:CODES]-(neighbourGene:Gene)<-[fd2:FOUND_DIFFERENTIAL]-(:Comparison)
WHERE
  fd1.logFC > logFCThreshold AND fd2.logFC > logFCThreshold
RETURN
  gene.symbol AS TargetGene,
  neighbourGene.symbol AS NeighborGene,
  neighbor.sid AS NeighborProtein,
  length(path) AS Distance
ORDER BY Distance;

多组学相关性 - 转录组学和蛋白质组学变化一致的基因

以特定 Comparison 为起点,查找在 mRNA 和蛋白质水平上表现出一致变化的基因。

// Multi-omics correlation - genes with matching transcriptomics and proteomics changes
WITH
  "COMP001" AS comparisonSid,
  "significant" AS significanceLevel
MATCH
  (comp:Comparison {sid: comparisonSid})-[r1:FOUND_DIFFERENTIAL]->(gene:Gene),
  (gene)-[:CODES]->(protein:Protein)<-[r2:FOUND_DIFFERENTIAL]-(comp)
WHERE
  r1.significance = significanceLevel
  AND r2.significance = significanceLevel
  AND r1.regulated = r2.regulated
RETURN
  gene.symbol AS Gene,
  protein.name AS Protein,
  r1.logFC AS mRNA_LogFC,
  r2.logFC AS Protein_LogFC,
  r1.regulated AS Direction,
  abs(r1.logFC - r2.logFC) AS Correlation_Delta
ORDER BY Correlation_Delta;

蛋白质组学数据质量指标

根据覆盖度和肽段计数评估蛋白质组学实验的质量。

// Proteomics data quality metrics
WITH "Proteomics" AS type
MATCH
  (exp:Experiment {type: type})-[r:HAS_VALUE]->(protein:Protein)
WITH
  exp,
  count(protein) AS ProteinsDetected,
  avg(r.coverage) AS AvgCoverage,
  avg(r.peptides) AS AvgPeptides,
  avg(r.intensity) AS AvgIntensity
RETURN
  exp.sid AS Experiment,
  exp.platform AS Platform,
  exp.method AS Method,
  ProteinsDetected,
  round(AvgCoverage, 1) AS AvgCoverage_Percent,
  round(AvgPeptides, 1) AS AvgPeptides,
  round(AvgIntensity, 1) AS AvgIntensity
ORDER BY ProteinsDetected DESC;

跨平台蛋白质组学比较

比较不同蛋白质组学平台上的相同蛋白质。这使我们能够观察跨平台测量的一致性,从而验证研究结果。

// Cross-platform proteomics comparison
WITH
  "P48736" AS proteinSid,
  "Proteomics" AS experimentType
MATCH (exp:Experiment {type: experimentType})-[r:HAS_VALUE]->(p:Protein {sid: proteinSid})
RETURN
  p.name AS Protein,
  exp.sid AS Experiment,
  exp.platform AS Platform,
  exp.method AS Method,
  r.logFC AS logFC,
  r.pValue AS PValue,
  r.peptides AS Peptides,
  r.coverage AS Coverage
ORDER BY exp.sid;

多组学整合通路分析

查找在转录组学和蛋白质组学中均富集的 GO 通路。

// MATCH (comp:Comparison {sid: "COMP009"})-[:FOUND_DIFFERENTIAL]->(gene:Gene)
WITH 2 AS geneCount
MATCH (gene)-[:CODES]->(protein:Protein)-[:ASSOCIATED_WITH]->(go:GO)
WITH
  go,
  count(DISTINCT gene) AS GeneCount,
  collect(DISTINCT gene.symbol) AS Genes
WHERE GeneCount >= geneCount
RETURN
  go.name AS Pathway,
  go.sid AS goId,
  GeneCount AS GenesInPathway,
  Genes
ORDER BY GeneCount DESC;

在 mRNA 和蛋白质水平上均过表达的基因提供了疾病特异性特征

我们以一个特定的 Disease(通过其 MONDO ID)为起点,该疾病正通过 Comparison 进行研究。从该 Comparison 中查找在转录本和蛋白质水平上均上调的基因。

这有助于识别得到多层组学数据支持的稳健疾病特征,并为进一步研究提供高置信度的靶点。

WITH
  "MONDO:0005359" AS diseaseSid,  // Non-alcoholic fatty liver disease
  "up" AS regulationDirection,
  "transcriptomics" AS rnaDataType,
  "proteomics" AS proteomicsDataType
MATCH
  (disease:Disease {sid: diseaseSid})<-[:STUDIES_DISEASE]-(comp:Comparison)-[rnaR:FOUND_DIFFERENTIAL]->(gene:Gene)
WHERE rnaR.regulated = regulationDirection AND rnaR.data_type = rnaDataType
MATCH
  (gene)-[:CODES]->(protein:Protein)<-[protR:FOUND_DIFFERENTIAL]-(comp)
WHERE protR.regulated = regulationDirection AND protR.data_type = proteomicsDataType
RETURN
  disease.name AS Disease,
  gene.symbol AS Gene,
  rnaR.logFC AS Transcript_FC,
  protR.logFC AS Protein_FC,
  (rnaR.logFC + protR.logFC) / 2.0 AS Avg_logFC,
  round(abs(rnaR.logFC - protR.logFC), 2) AS mRNA_Protein_Correlation_Delta
ORDER BY Avg_logFC DESC;

PPI 邻域内具有成药性的蛋白质

查找具有成药性且位于过表达基因(RNA-seq)的 PPI(蛋白质-蛋白质相互作用)邻域内的蛋白质。这很重要,因为即使你的目标蛋白质本身不具有成药性,其相互作用伙伴可能具有,从而提供替代的治疗策略。

WITH
  "EXP001" AS experimentSid,
  "RNA-seq" AS experimentType,
  "up" AS regulationDirection,
  0.05 AS pValueThreshold
MATCH
  (exp:Experiment {sid: experimentSid, type: experimentType})-[r:HAS_VALUE]->(gene:Gene)
WHERE
  r.regulated = regulationDirection AND r.pValue < pValueThreshold
MATCH
  (gene)-[:CODES]->(protein:Protein)-[:INTERACTS_WITH*1..2]-(neighbor:Protein)<-[:CODES]-(neighborGene:Gene)
OPTIONAL MATCH
  (neighborGene)-[:RELATED_TO]->(disease:Disease)
WITH
  DISTINCT neighbor,
  count(DISTINCT disease) AS `Disease Associations`,
  collect(DISTINCT disease.name) AS `Associated Diseases`
WHERE
  `Disease Associations` > 0
RETURN
  neighbor.name AS `Potential Drug Target`,
  neighbor.sid AS `UniProt Identifier`,
  `Disease Associations`,
  `Associated Diseases`
ORDER BY `Disease Associations` DESC;

靶点证据的来源和置信度?

以感兴趣的基因(例如 PIK3CG)为起点,查找来自文献、表达数据和通路关联的所有支持证据,以及可用的置信度评分。

// Get all evidence and provenance for PIK3CG
WITH "PIK3CG" AS geneSymbol
MATCH (g:Gene {symbol: geneSymbol})
OPTIONAL MATCH (g)-[r1:RELATED_TO]->(d:Disease)
OPTIONAL MATCH (g)<-[hv:HAS_VALUE]-(e:Experiment)<-[:COMPARES]-(c:Comparison)
OPTIONAL MATCH (g)-[:CODES]->(p:Protein)-[r2:ASSOCIATED_WITH]->(go:GO)
RETURN
  g.symbol AS Gene,
  collect(DISTINCT {
    source: r1.source,
    disease: d.name,
    confidence: r1.score
  }) AS `Literature Evidence`,
  collect(DISTINCT {
    experiment: e.sid,
    comparison: c.name,
    logFC: hv.logFC,
    pValue: hv.pValue,
    regulation: hv.regulated
  }) AS `Expression Evidence`,
  collect(DISTINCT {
    source: r2.source,
    pathway: go.name,
    goId: go.sid
  }) AS `Pathway Evidence`;

在疾病相关蛋白质中,是否存在比单一蛋白质更稳健的治疗靶点的功能模块或子网络(通路、蛋白质复合物)?

这些分析有助于识别驱动疾病的关键生物学过程,并突出显示可能比单独针对单个蛋白质更有效的潜在干预点。

// Find pathways enriched for disease-associated proteins
MATCH
  (disease:Disease)<-[:RELATED_TO]-(:Gene)-[:CODES]->(p:Protein)-[:ASSOCIATED_WITH]->(go:GO)
WITH
  disease, go, COLLECT(DISTINCT p.name) AS proteins
MATCH
  (go)-[:IS_PART_OF]->(pathway:Pathway)
RETURN
  pathway.name AS Pathway,
  size(proteins) AS ProteinCount,
  proteins AS ProteinsInPathway
ORDER BY ProteinCount DESC;

// Find GO terms enriched among Type 2 Diabetes-associated genes
WITH
  "Type 2 Diabetes" AS diseaseName,
  2 AS geneCountThreshold
MATCH (d:Disease {name: diseaseName})-[:RELATED_TO]-(g:Gene)-[:CODES]->(p:Protein)-[:ASSOCIATED_WITH]-(go:GO)
WITH geneCountThreshold, go, count(DISTINCT g) AS geneCount
WHERE geneCount >= geneCountThreshold
RETURN
  go.sid AS GOTerm,
  go.name AS GOName,
  geneCount AS NumGenesInModule
ORDER BY geneCount DESC;

// Find interconnected protein complexes in disease
WITH "Type 2 Diabetes" AS diseaseName
MATCH
  path=(d:Disease {name: diseaseName})<-[:RELATED_TO]-(g1:Gene)-[:CODES]->(p1:Protein)-[:INTERACTS_WITH]-(p2:Protein)<-[:CODES]-(g2:Gene)-[:RELATED_TO]->(d)
RETURN
  g1.symbol AS Gene1,
  g2.symbol AS Gene2,
  p1.sid AS Protein1,
  p2.sid AS Protein2
LIMIT 20;

查找与已知疾病基因共享多个 GO 术语但尚未被注释为该疾病相关基因的基因

我们旨在基于与已知疾病相关基因共享的 GO 通路,为某种疾病(例如二型糖尿病)寻找新的候选基因。

// Find genes that share multiple GO terms with known disease genes but aren't annotated to the disease
WITH
  "Type 2 Diabetes" AS diseaseName,
  2 AS minSharedPathways
MATCH
  (disease:Disease {name: diseaseName})<-[:RELATED_TO]-(known:Gene)-[:CODES]->(kp:Protein)-[:ASSOCIATED_WITH]->(go:GO),
  (go)<-[:ASSOCIATED_WITH]-(cp:Protein)<-[:CODES]-(candidate:Gene)
WHERE
  NOT (candidate)-[:RELATED_TO]->(disease)
  AND candidate <> known
WITH
  candidate, disease, minSharedPathways,
  count(DISTINCT go) AS sharedPathways,
  collect(DISTINCT go.name) AS pathways
WHERE sharedPathways >= minSharedPathways
RETURN
  candidate.symbol AS NovelCandidate,
  disease.name AS Disease,
  sharedPathways AS SharedPathwayCount,
  pathways AS SharedPathways
© . This site is unofficial and not affiliated with Neo4j, Inc.