Semantic proximity search tool
This tool aims to perform semantic proximity search between nodes on the graph based on anchored metagraphs or node embeddings. The anchored metagraph is introduced in our TKDE19 paper, an extension of metagraph in our ICDE16 paper.
Semantic Proximity Search on Graphs with Metagraph-based Learning.
Y. Fang, W. Lin, V. W. Zheng, M. Wu, K. C.-C. Chang and X. Li.
In ICDE 2016, pp. 277--288.
Code and Data
Usage (anchored metagraphs as features)
java -cp -Dconfig= -Dsplits= -Dfdb= -Dgt= -Dtest= -Dtrain= -Dpred= exec.Learn
Command line arguments
Java class path for the JRE files.
Model configuration file, which stores the hyper-parameters.
Number of splits for train/test sets (see more details in -Dtest below）
A file in Metagraph Feature Format.
The ground truth file.
The prefix of test set filename. The program will automatically append numbers 1, 2, 3, ... (up to the number of splits) to the prefix as the actual filenames. For example, if -Dtest=test and -Dsplit=10, then the program will attempt to load test set files named test1, test2, ..., test10.
The prefix of train set filename. Similar to the -Dtest argument.
The prefix of prediction filename. Similar to the -Dtest argument.
The format of configuration, ground truth, train, test and prediction files can be found in the File Format section below.
Sample command line
java -cp lib/*: -Dconfig=config -Dsplits=10 -Dfdb=data/dblp/feature.dblp -Dgt=data/dblp/advisor/ground_truth.txt -Dtest=data/dblp/advisor/test -Dtrain=data/dblp/advisor/train -Dpred=data/dblp/advisor/pred exec.Learn
Usage (node embeddings as features)
java -cp lib/*: -Dconfig= -Dsplits= -Demb= -Dgt= -Dtest= -Dtrain= -Dpred= exec.LearnEmb
Command line arguments
A file containing node embeddings in text format. The first line contains two integers, the number of nodes and number of dimensions. Each subsequent line contain the embedding of a node. All lines are space delimited.
- All other arguments are the same as using anchored metagraphs as features above.
Stores the following hyperparmeters of the learner. Default values are included in the sample file.
Learning rate for gradient descent.
Maximum relative different before gradient descent stops.
Number of trials to perform gradient descent using different seeds.
Factor to adjust the learning rate when the loss is decreasing after each iteration.
Factor to adjust the learning rate when the loss in not decreasing after each iteration.
Scaling factor in Eq. (4) of our TKDE19 paper.
Whether to apply log1p transformation on the metagraph features.
Whether to suppress messages.
Ground truth file
Each line represents one query, delimited by tabs. The first column in each line represents the query node ID, and subsequent columns represent the ground-truth (positive) nodes ranked by proximity.
Each line contains a triple (q, a, b), where q denotes the query node, and a, b are two nodes such that node a should be ranked higher than node b w.r.t. q.
Each line represents a query, delimited by tabs. The first column in each line represents the query node ID, and subsequent columns represent the candidate nodes (both positive and negative) in randomized order.
Each line represents a query, delimited by tabs. The first column in each line represents the query node ID, and subsequent columns represent the candidate nodes (both positive and negative) in ranked order. Note that the query nodes may appear in an arbitrary order, different from the corresponding test file.
We provide any code and/or data on an as-is basis. Use at your own risk.