The single-source shortest path is used to calculate the shortest paths from a specific source node to other nodes in a graph. The shortest path is calculated by using the Dijkstra algorithm. The single-source shortest path is suitable for graphs with non-negative edge weights. It is widely applied in areas such as network routing, traffic planning, and geographic information systems.
Configure the component
Method 1: Configure the component on the pipeline page
On the pipeline details page in Machine Learning Designer, add the Single-source Shortest Path component to the pipeline and configure the parameters described in the following table.
Tab | Parameter | Description |
Fields Setting | Source Vertex Column | The start vertex column in the edge table. |
Target Vertex Column | The end vertex column in the edge table. | |
Edge Weight Column | The edge weight column in the edge table. | |
Parameters Setting | Initial Node ID | The start vertex that is used to calculate the shortest path. |
Tuning | Number of Workers | The number of vertices for parallel job execution. The degree of parallelism and framework communication costs increase with the value of this parameter. |
Worker Memory (MB) | The maximum size of memory that a single job can use. Unit: MB. Default value: 4096. If the size of used memory exceeds the value of this parameter, the |
Method 2: Configure the component by using PAI commands
Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see Scenario 4: Execute PAI commands within the SQL script component.
PAI -name SSSP
-project algo_public
-DinputEdgeTableName=SSSP_func_test_edge
-DfromVertexCol=flow_out_id
-DtoVertexCol=flow_in_id
-DoutputTableName=SSSP_func_test_result
-DhasEdgeWeight=true
-DedgeWeightCol=edge_weight
-DstartVertex=a;
Parameter | Required | Default value | Description |
inputEdgeTableName | Yes | No default value | The name of the input edge table. |
inputEdgeTablePartitions | No | Full table | The partitions in the input edge table. |
fromVertexCol | Yes | No default value | The start vertex column in the input edge table. |
toVertexCol | Yes | No default value | The end vertex column in the input edge table. |
outputTableName | Yes | No default value | The name of the output table. |
outputTablePartitions | No | No default value | The partitions in the output table. |
lifecycle | No | No default value | The lifecycle of the output table. |
workerNum | No | No default value | The number of vertices for parallel job execution. The degree of parallelism and framework communication costs increase with the value of this parameter. |
workerMem | No | 4096 | The maximum size of memory that a single job can use. Unit: MB. Default value: 4096. If the size of used memory exceeds the value of this parameter, the |
splitSize | No | 64 | The data split size. Unit: MB. |
startVertex | Yes | No default value | The ID of the start vertex. |
hasEdgeWeight | No | false | Specifies whether the edges in the input edge table have weights. |
edgeWeightCol | No | No default value | The edge weight column in the input edge table. |
Example
On the pipeline details page, add a SQL Script component to the pipeline and click the component. On the Parameters Setting tab, clear Use Script Mode and Whether the system adds a create table statement, and enter the following SQL statements in the SQL Script editor:
drop table if exists SSSP_func_test_edge; create table SSSP_func_test_edge as select flow_out_id,flow_in_id,edge_weight from ( select "a" as flow_out_id,"b" as flow_in_id,1.0 as edge_weight union all select "b" as flow_out_id,"c" as flow_in_id,2.0 as edge_weight union all select "c" as flow_out_id,"d" as flow_in_id,1.0 as edge_weight union all select "b" as flow_out_id,"e" as flow_in_id,2.0 as edge_weight union all select "e" as flow_out_id,"d" as flow_in_id,1.0 as edge_weight union all select "c" as flow_out_id,"e" as flow_in_id,1.0 as edge_weight union all select "f" as flow_out_id,"g" as flow_in_id,3.0 as edge_weight union all select "a" as flow_out_id,"d" as flow_in_id,4.0 as edge_weight ) tmp;
Data structure
Add a SQL Script component to the pipeline and click the component. On the Parameters Setting tab, clear Use Script Mode and Whether the system adds a create table statement, and enter the following SQL statements in the SQL Script editor. Connect this component with the component added in Step 1.
drop table if exists ${o1}; PAI -name SSSP -project algo_public -DinputEdgeTableName=SSSP_func_test_edge -DfromVertexCol=flow_out_id -DtoVertexCol=flow_in_id -DoutputTableName=${o1} -DhasEdgeWeight=true -DedgeWeightCol=edge_weight -DstartVertex=a;
In the upper-left corner of the canvas, click
to run the pipeline.
After the pipeline is run, click the SQL Script component added in Step 2, and choose View Data > SQL Script Output to view the training results.
| start_node | dest_node | distance | distance_cnt | | ---------- | --------- | -------- | ------------ | | a | a | 0.0 | 0 | | a | b | 1.0 | 1 | | a | c | 3.0 | 1 | | a | d | 4.0 | 3 | | a | e | 3.0 | 1 |