Patent9专利在线

当前查询到4条专利与查询词 "Du Mingben"相关，搜索用时0.1562421秒!排序方式：

发明专利：4实用新型: 0外观设计: 0

共 4 条，当前第 1-4 条　返回搜索页

1：[发明] 【中文】一种基于潜在语义分析的文档相似度识别方法及装置【EN】Document similarity recognition method and device based on latent semantic analysis

申请号：201911378044.7 公开号：CN111178038A 主分类号：G06F40/194

申请人：【中文】山东旗帜信息有限公司【EN】Shandong Banner Information Co., Ltd. 申请日:2019.12.27 公开日：2020.05.19

发明人：【中文】于文才;杜志诚;杜明本;钟琴隆;王秀芹;朱习文;董林林;叶玏【EN】Yu Wencai;Du Zhicheng;Du Mingben;Zhong Qinlong;Wang Xiuqin;Zhu Xiwen;Dong Linlin;Ye Le

摘要：【中文】一种基于潜在语义分析的文档相似度识别方法及装置，包括如下步骤：构建原始文档库，所述原始文档库包括若干原始文本，所述原始文本经预处理得到与原始文本一一对应的原始文本词袋向量；获取输入文本，将输入文本进行预处理得到输入文本词袋向量；计算输入文本词袋向量与原始文本词袋向量的近似程度，得到与输入文本近似程度最高的原始文本。本申请首先构建一个文档库，然后以该文档库作为基本文本，将输入文本作为主对比文本进行对比，借助词袋向量，从基础文本中找到输入文本类似的文档，由于词袋向量本身考虑到了语义，因此可以更好的在潜在语义的基础上获得更好的文档相似度的识别效果。【EN】A document similarity recognition method and device based on latent semantic analysis comprise the following steps: constructing an original document library, wherein the original document library comprises a plurality of original texts, and the original texts are preprocessed to obtain original text bag-of-word vectors corresponding to the original texts one by one; acquiring an input text, and preprocessing the input text to obtain an input text word bag vector; and calculating the approximation degree of the input text bag-of-words vector and the original text bag-of-words vector to obtain the original text with the highest approximation degree with the input text. According to the method, a document library is firstly established, then the document library is used as a basic text, the input text is used as a main comparison text for comparison, documents similar to the input text are found from the basic text by means of a bag-of-words vector, and because the bag-of-words vector considers semantics, a better recognition effect of document similarity can be obtained on the basis of potential semantics.

详细信息下载全文

2：[发明] 【中文】一种实体识别模型的训练方法及装置【EN】Training method and device for entity recognition model

申请号：202010016766.4 公开号：CN111222337A 主分类号：G06F40/295

申请人：【中文】山东旗帜信息有限公司【EN】Shandong Banner Information Co., Ltd. 申请日:2020.01.08 公开日：2020.06.02

发明人：【中文】于文才;杜志诚;杜明本;钟琴隆;崇学伟;于雪磊;闫晗;杨红超【EN】Yu Wencai;Du Zhicheng;Du Mingben;Zhong Qinlong;Chongxuewei;Yu Xuelei;Yan Han;Yang Hongchao

摘要：【中文】一种实体识别模型的训练方法及装置，包括如下步骤：获取用于实体识别的语料；将语料进行标注；将标注之后的语料进行编码；将编码之后的语料用作深度学习网络的材料以训练得到实体识别模型；所述语料进行编码时采用BERT‑WWM模型进行编码处理。本申请采用一个特定的编码形式实现编码，从而将实体识别的语料进行预处理，该预处理的含义并不是为了提供一种精确的，电脑语言可识别的编码，而是提供一种可以提供多维度训练语言的工具；由于BERT‑WWM模型允许提供全词mask的方式进行语料处理，使得在特定的语料下可以训练该深度学习网络的预测纠错能力，从而在大大提高其学习的效率的同时，也提高了其识别的能力。【EN】A method and a device for training an entity recognition model comprise the following steps: obtaining a corpus used for entity identification; marking the corpus; coding the labeled corpus; using the encoded corpus as a material of a deep learning network to train to obtain an entity recognition model; and when the corpus is coded, a BERT-WWM model is adopted for coding processing. The method adopts a specific coding form to realize coding, so that the corpus identified by an entity is preprocessed, and the preprocessing meaning is not to provide an accurate code which can be identified by a computer language, but provides a tool which can provide a multidimensional training language; because the BERT-WWM model allows the way of providing a full-word mask to carry out corpus processing, the prediction error correction capability of the deep learning network can be trained under specific corpus, thereby greatly improving the learning efficiency and improving the recognition capability.

详细信息下载全文

3：[发明] 【中文】一种发票识别方法及装置【EN】Invoice identification method and device

申请号：201911381468.9 公开号：CN111126319A 主分类号：G06K9/00

申请人：【中文】山东旗帜信息有限公司【EN】Shandong Banner Information Co., Ltd. 申请日:2019.12.27 公开日：2020.05.08

发明人：【中文】杜明本;钟琴隆;杜志诚;于文才;孙凡波;孙品;岳猛;殷忠源【EN】Du Mingben;Zhong Qinlong;Du Zhicheng;Yu Wencai;Sun Fanbo;Sun Pin;Yue Meng;Yin Zhongyuan

摘要：【中文】一种发票识别方法及装置，包括如下步骤：获取发票；将所述发票按照内容进行切割得到分区；对分区中的内容按照条目进行切割得到子集；对所述子集进行识别得到相应子集的子集内容；将全部子集的子集内容还原到所述发票的相应位置得到发票内容。本申请利用发票本身的特点，将发票进行二次分割，首先得到分区然后得到子集，由于子集和分区的依次切割，使得可以根据子集的具体内容进行相对定向的识别，可以提高识别的速率和准确性。【EN】An invoice identification method and device comprises the following steps: acquiring an invoice; cutting the invoice according to the content to obtain partitions; cutting the content in the partition according to the items to obtain a subset; identifying the subsets to obtain the subset contents of the corresponding subsets; and restoring the subset contents of all the subsets to the corresponding positions of the invoices to obtain the invoice contents. According to the method and the device, the invoice is subjected to secondary segmentation by utilizing the characteristics of the invoice, the sub-sets are obtained after the sub-sets are obtained, and due to the sequential cutting of the sub-sets and the sub-sets, the relative directional identification can be carried out according to the specific content of the sub-sets, so that the identification rate and accuracy can be improved.

详细信息下载全文

4：[发明] 【中文】一种数据爬取方法及装置【EN】Data crawling method and device

申请号：202010105861.1 公开号：CN111259224A 主分类号：G06F16/951

申请人：【中文】山东旗帜信息有限公司【EN】Shandong Banner Information Co., Ltd. 申请日:2020.02.20 公开日：2020.06.09

发明人：【中文】钟琴隆;杜志诚;杜明本;于文才;马强;刘霞;王冬冬;李春勇【EN】Zhong Qinlong;Du Zhicheng;Du Mingben;Yu Wencai;Ma Qiang;Liu Xia;Wang Dongdong;Li Chunyong

摘要：【中文】一种数据爬取方法及装置，包括如下步骤：爬取数据；从爬取到的数据中找到并定位错误数据的位置以及定位错误数据对应的代码位置并找到加密代码；从加密代码得到密钥并结合错误数据对应的代码进行数据修正，得到经修正之后的正确数据。本申请通过对数据对应出处的代码进行分析得到其原始数据，并通过对错误数据的来源进行代码分析得到密钥，继而得出正确数据。【EN】A data crawling method and device comprises the following steps: crawling data; finding and positioning the position of error data and a code position corresponding to the error data from the crawled data and finding an encrypted code; and obtaining a key from the encrypted code and carrying out data correction by combining with a code corresponding to the error data to obtain corrected correct data. According to the method and the device, the original data of the data are obtained by analyzing the codes corresponding to the data, the key is obtained by analyzing the codes of the source of the error data, and then the correct data is obtained.

详细信息下载全文

共 4 条，当前第 1-4 条　返回搜索页