What Is Citation Recommendation?
Citation recommendation is an automated process that suggests relevant scholarly works to cite within a manuscript, based on the content of the current document and the citation patterns of existing literature.
- Improves literature coverage and scholarly rigor.
- Reduces manual search time for authors.
- Supports interdisciplinary discovery by surfacing non‑obvious connections.
What Is SymTax?
SymTax is a machine‑learning framework designed specifically for citation recommendation. It leverages symbolic taxonomy representations and graph‑based embeddings to model the semantic relationships between papers.
- Combines textual features (abstracts, titles) with citation network structure.
- Employs a hybrid of supervised and unsupervised learning.
- Optimized for scalability across large bibliographic corpora.
How Does SymTax Work?
The SymTax pipeline consists of three core stages:
- Feature Extraction: Textual embeddings are generated using transformer models; citation graphs are encoded with node2vec‑style walks.
- Taxonomy Construction: Papers are clustered into a hierarchical taxonomy based on semantic similarity and citation proximity.
- Recommendation Scoring: For a target manuscript, SymTax computes a relevance score for each candidate paper by aggregating taxonomy proximity, citation co‑occurrence, and contextual similarity.
Training involves minimizing a contrastive loss that pushes relevant citations closer in the embedding space while pushing irrelevant ones apart.
Why Use SymTax?
Empirical evaluations across five benchmark citation recommendation datasets demonstrate SymTax’s advantages:
- Higher Precision@k: Consistently outperforms baseline models such as CiteULike and DeepWalk‑based recommenders.
- Robustness to Domain Shift: The hierarchical taxonomy adapts to new research areas without extensive retraining.
- Interpretability: The taxonomy provides a human‑readable structure that explains why a paper is suggested.
Performance Evaluation Overview
Key metrics reported in comparative studies include:
- Mean Reciprocal Rank (MRR)
- Recall@10 and Recall@20
- Normalized Discounted Cumulative Gain (nDCG)
Across all datasets, SymTax achieves an average MRR improvement of 12% over the next best model, confirming its effectiveness for real‑world scholarly workflows.