Abstract by Piper Armstrong
The Importance of Importance: Improving Cross-Reference Candidate Suggestion with PageRank
Cross-references are a useful tool for studying and understanding the content of a corpus. They allow users to find related sections of text, moreover, good cross-references focus on topically important sections of text. Prior work in automatic cross-referencing employs the Anchor Words algorithm and fine-grained topic modeling but addresses only the search for topically related documents. Our work focuses on the second element of good cross-references: the search for topically important cross-references. Our approach for finding important groups of cross-references uses a novel combination of clustering and the PageRank algorithm to assign degrees of importance to the documents in the corpus and then uses those ranks in conjunction with topical similarity to order candidate cross-reference pairs, resulting in significant improvement as compared to the use of topic similarity alone.