A Maritime Document Knowledge Graph Construction Method Based on Conceptual Proximity Relations

A Maritime Document Knowledge Graph Construction Method Based on Conceptual Proximity Relations

在线阅读下载PDF

导出

摘要 The cost and strict input format requirements of GraphRAG make it less efficient for processing large documents. This paper proposes an alternative approach for constructing a knowledge graph (KG) from a PDF document with a focus on simplicity and cost-effectiveness. The process involves splitting the document into chunks, extracting concepts within each chunk using a large language model (LLM), and building relationships based on the proximity of concepts in the same chunk. Unlike traditional named entity recognition (NER), which identifies entities like “Shanghai”, the proposed method identifies concepts, such as “Convenient transportation in Shanghai” which is found to be more meaningful for KG construction. Each edge in the KG represents a relationship between concepts occurring in the same text chunk. The process is computationally inexpensive, leveraging locally set up tools like Mistral 7B openorca instruct and Ollama for model inference, ensuring the entire graph generation process is cost-free. A method of assigning weights to relationships, grouping similar pairs, and summarizing multiple relationships into a single edge with associated weight and relation details is introduced. Additionally, node degrees and communities are calculated for node sizing and coloring. This approach offers a scalable, cost-effective solution for generating meaningful knowledge graphs from large documents, achieving results comparable to GraphRAG while maintaining accessibility for personal machines. The cost and strict input format requirements of GraphRAG make it less efficient for processing large documents. This paper proposes an alternative approach for constructing a knowledge graph (KG) from a PDF document with a focus on simplicity and cost-effectiveness. The process involves splitting the document into chunks, extracting concepts within each chunk using a large language model (LLM), and building relationships based on the proximity of concepts in the same chunk. Unlike traditional named entity recognition (NER), which identifies entities like “Shanghai”, the proposed method identifies concepts, such as “Convenient transportation in Shanghai” which is found to be more meaningful for KG construction. Each edge in the KG represents a relationship between concepts occurring in the same text chunk. The process is computationally inexpensive, leveraging locally set up tools like Mistral 7B openorca instruct and Ollama for model inference, ensuring the entire graph generation process is cost-free. A method of assigning weights to relationships, grouping similar pairs, and summarizing multiple relationships into a single edge with associated weight and relation details is introduced. Additionally, node degrees and communities are calculated for node sizing and coloring. This approach offers a scalable, cost-effective solution for generating meaningful knowledge graphs from large documents, achieving results comparable to GraphRAG while maintaining accessibility for personal machines.

作者 Yiwen Lin Tao Yang Yuqi Shao Meng Yuan Pinghua Hu Chen Li Yiwen Lin;Tao Yang;Yuqi Shao;Meng Yuan;Pinghua Hu;Chen Li(COSCO Shipping Technology Co., Ltd., Shanghai, China;COSCO Shipping Specialized Carriers Co., Ltd., Guangzhou, China)

机构地区 COSCO Shipping Technology Co. COSCO Shipping Specialized Carriers Co.

出处《Journal of Computer and Communications》 2025年第2期51-67,共17页 电脑和通信(英文)

关键词 Knowledge Graph Large Language Model Concept Extraction Cost-Effective Graph Construction Knowledge Graph Large Language Model Concept Extraction Cost-Effective Graph Construction

分类号 O15 [理学—基础数学]

引文网络
相关文献

1张励亢.大语言模型赋能英语教学智慧课堂设计研究[J].信息系统工程,2025(2):153-156.
2姚松林,梁敦毫,罗振营.基于BERT-BiLSTM-CRF模型的荔枝命名实体识别[J].中国科技信息,2025(3):110-113.
3吴文隆,尹海莲,王宁,徐梦飞,赵鑫喆,殷崭祚,刘元睿,王昊奋,丁岩,李博涵.大语言模型和知识图谱协同的跨域异质数据查询框架[J].计算机研究与发展,2025,62(3):605-619.
4Yi Han,Tao Yang,Meng Yuan,Pinghua Hu,Chen Li.Construction of a Maritime Knowledge Graph Using GraphRAG for Entity and Relationship Extraction from Maritime Documents[J].Journal of Computer and Communications,2025,13(2):68-93.
5Shukang YIN,Chaoyou FU,Sirui ZHAO,Tong XU,Hao WANG,Dianbo SUI,Yunhang SHEN,Ke LI,Xing SUN,Enhong CHEN.Woodpecker:hallucination correction for multimodal large language models[J].Science China(Information Sciences),2024,67(12):48-60.
6Zhen Yang,Yongbin Liu,Chunping Ouyang,Shu Zhao,Chi Zhu.Improving Few-Shot Named Entity Recognition with Causal Interventions[J].Big Data Mining and Analytics,2024,7(4):1375-1395.
7Huanjing Zhao,Pinde Rui,Jie Chen,Shu Zhao,Yanping Zhang.Restage:Relation Structure-Aware Hierarchical Heterogeneous Graph Embedding[J].Tsinghua Science and Technology,2025,30(1):198-214.
8World Journal of Traditional Chinese Medicine on Web[J].World Journal of Traditional Chinese Medicine,2024,10(3):269-269.
9黄小容,田倩飞,陈云伟,周海晨.人工智能生成内容技术对科技期刊出版的影响与应对[J].编辑学报,2024,36(S1):105-110.
10尚碧筠,韩银俊,肖蓉,陈正华,屠要峰,董振江.ScaleFS:面向大语言模型的高性能可扩展元数据设计[J].计算机研究与发展,2025,62(3):589-604.

Journal of Computer and Communications

2025年第2期

浏览历史

内容加载中请稍等...

A Maritime Document Knowledge Graph Construction Method Based on Conceptual Proximity Relations

相关作者

相关机构

相关主题

浏览历史