机器学习

节点分类

原 alpha 版本的节点分类功能已被完全移除，并整合至节点分类流水线 (node classification pipelines) 中。在训练节点分类模型之前，您必须创建并配置训练流水线。

训练

训练的部分步骤现在需要在训练流水线的特定配置过程中进行设置。为了生效，这些步骤必须在调用 train 过程之前执行。其余部分已迁移至流水线训练过程。请参阅下表。

表 1. 训练配置的变更
1.x	2.x
`modelName`	此参数现在仅在 `gds.beta.pipeline.nodeClassification.train` 中配置。
`featuresProperties`	此参数已被 `gds.beta.pipeline.nodeClassification.selectFeatures` 取代。现在还有一个 `gds.beta.pipeline.nodeClassification.addNodeProperty` 过程，用于在训练流水线和生成的分类模型中为输入图计算节点属性。
`targetProperty`	此参数现在仅在 `gds.beta.pipeline.nodeClassification.train` 中配置。
`holdoutFraction`	此参数现在命名为 `testFraction`，并在 `gds.beta.pipeline.nodeClassification.configureSplit` 中配置。
`validationFolds`	此参数现在仅在 `gds.beta.pipeline.nodeClassification.configureSplit` 中配置。
`metrics`	此参数现在仅在 `gds.beta.pipeline.nodeClassification.train` 中配置。
`params`	此参数已被 `gds.beta.pipeline.nodeClassification.addLogisticRegression` 取代，允许为单个模型候选者进行配置。该过程可以多次调用以添加多个模型候选者。此外，现在可以使用 `gds.beta.pipeline.nodeClassification.addRandomForest` 将随机森林作为新的模型候选者选项。
`randomSeed`	此参数现在仅在 `gds.beta.pipeline.nodeClassification.train` 中配置。

表 2. 流水线配置的变更
1.x	2.x
`gds.beta.pipeline.nodeClassification.configureParams`	该过程已不再提供，它曾用于添加逻辑回归模型候选者。现在可以通过调用一次或多次 `gds.beta.pipeline.nodeClassification.addLogisticRegression` 来添加逻辑回归候选者。

预测 (Predict)

除下表列出的参数外，节点分类预测的 API 与之前相同，但过程名称有所不同。这些过程为 gds.beta.pipeline.nodeClassification.predict.[mutate,stream,write]。

表 3. 预测配置的变更
1.x	2.x
`batchSize`	批处理大小现在由内部自动优化，不再支持用户配置。

表 4. 预测过程的替换
1.x	2.x
`gds.alpha.ml.nodeClassification.predict.stream`	`gds.beta.pipeline.nodeClassification.predict.stream`
`gds.alpha.ml.nodeClassification.predict.mutate`	`gds.beta.pipeline.nodeClassification.predict.mutate`
`gds.alpha.ml.nodeClassification.predict.write`	`gds.beta.pipeline.nodeClassification.predict.write`

链路预测

原 alpha 版本的链路预测功能已被完全移除，并整合至链路预测流水线 (link prediction pipelines) 中。在训练链路预测模型之前，您必须创建并配置训练流水线。

训练 (Train)

表 5. 训练配置的变更
1.x	2.x
`modelName`	此参数现在仅在 `gds.beta.pipeline.linkPrediction.train` 中配置。
`featuresProperties`	被 `gds.beta.pipeline.linkPrediction.addFeature` 中的 `nodeProperties` 取代。此外还有一个 `gds.beta.pipeline.linkPrediction.addNodeProperty` 过程，用于在训练流水线和生成的分类模型中为输入图计算节点属性。
`linkFeatureCombiner`	被 `gds.beta.pipeline.linkPrediction.addFeature` 的第二个位置参数（称为 `featureType`）取代。
`trainRelationshipType` 和 `testRelationshipType`	这些参数已被移除。请使用 `gds.beta.pipeline.linkPrediction.configureSplit` 来设置数据集拆分。
`validationFolds`	此参数现在仅在 `gds.beta.pipeline.linkPrediction.configureSplit` 中配置。
`negativeClassWeight`	此参数现在仅在 `gds.beta.pipeline.linkPrediction.train` 中配置。
`params`	此参数已被 `gds.beta.pipeline.linkPrediction.addLogisticRegression` 取代，允许为单个模型候选者进行配置。该过程可以多次调用以添加多个模型候选者。此外，现在可以使用 `gds.beta.pipeline.linkPrediction.addRandomForest` 将随机森林作为新的模型候选者选项。
`randomSeed`	此参数现在仅在 `gds.beta.pipeline.linkPrediction.train` 中配置。

表 6. 流水线配置的变更
1.x	2.x
`gds.beta.pipeline.linkPrediction.configureParams`	该过程已不再提供，它曾用于添加逻辑回归模型候选者。现在可以通过调用一次或多次 `gds.beta.pipeline.linkPrediction.addLogisticRegression` 来添加逻辑回归候选者。

预测 (Predict)

链路预测分类的 API 与之前相同，但过程名称有所不同。这些过程为 gds.beta.pipeline.linkPrediction.predict.[mutate,stream]。需要注意的是，链路预测分类不再支持 write 模式，但您可以通过 mutate 模式后接 gds.graph.relationship.write 来模拟此行为。

表 7. 预测过程的替换
1.x	2.x
`gds.alpha.ml.linkPrediction.predict.stream`	`gds.beta.pipeline.linkPrediction.predict.stream`
`gds.alpha.ml.linkPrediction.predict.mutate`	`gds.beta.pipeline.linkPrediction.predict.mutate`
`gds.alpha.ml.linkPrediction.predict.write`	`-`