3[1]T.W. Yan and H. Garcia- Molina. Duplicate removal in information dissemination. In Proceedings of the 21st International Conference on Very Large Data Bases(VLDB' 95) ,66 - 77,San Francisco,Ca., USA,September 1995. Morgan Kaufmann Publishers, Inc.
4[2]Narayanan Shivakumar and Hector Garcia- Molina. SCAM: a copy detection mechanism for digital documents. In Proceedings of 2nd International Conference in Theory and Practice of Digital Libraries (DL'95) ,Austin, Texas,June 1995.
5[3]T. Yan and H. Garcia- Molina. The sift information dissemination system. In ACM TODS,2000.
6[4]J.W. Kirriemuir & P. Willett Identification of duplicate and near - duplicate full - text records in database search outputs using hierarchic cluster analysis,in Program-automated library and information,(1995)29(3) :241-256.
7[5]Buckley C. ,Cardie C. ,Mardis S. ,Mitra M. ,Pierce D. ,Wagstaff K. ,Walz J. ,The Smart/Empire TIPSTER IR System, TIPSTER Phase Ⅲ Proceedings,Morgan Kaufmann,San Francisco,CA,2000.
9Border A Z, Glassman S C, Manasse M S, etal. Syntactic clustering of the Web[C]//Proceedings of the 6th ACM International Conference on World Wide Web.USA: ACM Press, 1997:1157-1166.
10Cho J H, Shivakumar N, Gareia-Molina H. Finding replicated web collections[C]//Proeeedings of the ACM International Conference on Management of the Data.USA: ACM Press, 2000, 29(2): 355-366.