用户(guest)
Language
|
词是最小的能够独立活动的有意义的语言成分,但汉语是以字为基本的书写单位,词语之间没有明显的区分标记,因此,中文词法分析是中文信息处理的基础与关键。为此,我们中国科学院计算技术研究所在多年研究基础上,耗时一年研制出了基于多层隐马模型的汉语词法分析系统ICTCLAS(Institute of Computing Technology, Chinese Lexical Analysis System),该系统的功能有:中文分词;词性标注;未登录词识别。分词正确率高达97.58%(最近的973专家组评测结果),基于角色标注的未登录词识别能取得高于90%召回率,其中中国人名的识别召回率接近98%,分词和词性标注处理速度为543.5KB/s。
ICTCLAS和计算所其他14项免费发布的成果被中外媒体广泛地报道,截止到9月,ICTCLAS被来自于中国、日本、新加坡、韩国、美国以及其他国家和地区的30000多位研究人员和商业机构下载使用。我们为免费发布ICTCLAS并能帮助用户解决中文词法问题而深感荣幸!
计算所汉语词法分析系统ICTCLAS同时还提供一套完整的动态连接库ICTCLAS.dll,COM组件和相应的概率词典,开发者可以完全忽略汉语词法分析,直接在自己的系统中调用ICTCLAS,ICTCLAS可以根据需要输出多个高概率的结果,输出格式也可以定制,开发者在分词和词性标注的基础上继续上层开发。
欢迎相关领域的工程技术人员、研究人员使用,并提供宝贵意见。
Word is the minimum meaningful unit of languages. It’s well known that there are no separators between words in Chinese text. Therefore, Chinese lexical analysis is a prerequisite to Chinese information processing. Based on years of research, we have developed a Chinese lexical analysis system ICTCLAS (Institute of Computing Technology, Chinese Lexical Analysis System) using an approach based on multi-layer HMM. ICTCLAS includes word segmentation, Part-Of-Speech tagging and unknown words recognition. Its segmentation precision is 97.58%(result from recent official evaluation in national 973 project). The recalling rates of unknown words recognized using roles tagging achieve more than 90%. Especially, the recalling of Chinese person names achieve nearly 98%. The speed for word segmentation and POS tagging is 543.5KB/s.
ICTCLAS and other 14 free systems from Institute of Computing Technology were broadly reported in China and abroad as well. Until Sep., ICTCLAS had been downloaded by over 30,000 researchers or commercial organizations from China, Japan, Singapore, Korea, USA and other countries or areas. We are honored to distribute ICTCLAS free of fees and help users solve problems from Chinese lexical analysis.
In addition, we provide ICTCLAS.dll for developers invoking in their own systems. Any question, comments or advice about ICTCLAS are welcomed.
Author:Kevin Zhang (张华平)
Institute:Institute of Computing Technology, Chinese Academy of Sciences
Email:zhanghp@software.ict.ac.cn
Tel: +86-10-88449181转718 |
 |
| 建立时间 |
2002-08-16 11:26:42 |
| 许可证方式 |
自然语言处理开放资源许可证 |
| 运行环境 |
Win9X, Win2000, Win NT, Win XP, Linux |
| 程序语言 |
C/C++ |
|
|