英语语法网 英语词汇网 高考英语网 中考英语网
精心组稿 精巧编排 精彩纷呈 全心打造英语第一品牌!
加入收藏
网站地图
购点说明
首    页 | 语法新闻 | 名词用法 | 代词用法 | 冠词用法 | 数词用法 | 介词用法 | 连词用法 | 形容词用法 | 副词用法 | 比较等级 | 动词用法 | 连系动词 | 情态动词 | 动词时态 | 被动语态 | 虚拟语气 | 非谓语动词 | 疑问句 | 祈使句 | 感叹句 | 否定句 | 倒装句 | 强调句 | there be存在句 | 省略句 | 独立主格 | 主谓一致 | 状语从句 | 定语从句 | 名词性从句 | it用法 | 语法练习 | 语法考试 | 语法综合 | 句子成分 | 语法连载 | 语法著作 | 英语语料库 | 语法与翻译 | 双语阅读 | 语法与惯用法 | 语法与写作 | 期刊精选 | 语法观点 | 语法挑刺 | 下载中心 | 开心一刻 | 会员之家 | 专家顾问 | 百家讲坛 | 答疑中心
您现在的位置: 首页 > 英语语法 > 英语语料库 >
美国国家语料库(ANC)介绍
作者:admin    文章来源:本站原创    点击数:    更新时间:2011/11/16    
        ★★★ 【字体:
说明:引用此文请注明出处,并务请保留后面的有效链接地址,谢谢!


美国国家语料库(ANC)介绍

 

(欢迎收藏本页)

 

ANC = The American National Corpus美国国家语料库

http://www.anc.org/ 

 

美国国家语料库(American National CorpusANC)是目前规模最大的关于美国英语使用现状的语料库,它包括从1990年起的各种文字材料、口头材料的文字记录。ANC已出版过两个版本,第一个版本包含1,000万口语和书面语美式英语词汇,第二个版本则包含了2,200万口语和书面语美式英语词汇。

The First Release of the ANC

The First Release of the ANC is a beta version. It contains over 10,000,000 words of written and spoken American English, annotated for lemma and part of speech. It is available for research and education for a nominal licensing fee from the Linguistic Data Consortium. Commercial users can obtain the corpus and gain rights to use it in commercial products by joining the ANC Consortium.

The texts included in the first 10 million words of the ANC are those that were first received. Therefore the corpus is not balanced. There has been no hand-validation of the XML tagging or the part of speech annotation tags. Headers are minimal, although they contain fairly complete information concerning domain, subdomain, subject, audience, and medium. Check the list of known bugs and caveats for a description of the limitations we are currently aware of.

One of the aims of releasing this first 10 million words is to get feedback from the community about its structure and annotation, so that modifications can be made, if necessary, for the final release of the full 100 million words. We therefore invite comments and bug reports from the community of ANC users. Please contact anc@cs.vassar.edu .

The Second Release of the ANC

The Second Release of the American National Corpus contains over 22,000,000 words of written and spoken American English, annotated for lemma, part of speech, noun chunks, and verb chunks. Part of speech tags using the Penn tagset are included for all data in the Second Release, and many documents are also PoS-tagged using the Biber tagset.

The ANC Second Release is available for research and education for a nominal licensing fee from the Linguistic Data Consortium. Commercial users can obtain the corpus and gain rights to use it in commercial products by joining the ANC Consortium. Please consult the LDC Catalog entry for the ANC Second Release.

The First and Second Releases of the ANC include materials which have been acquired to date, and therefore the current release of the ANC is not balanced. There has been no hand-validation of the XML tagging or the annotation. Headers are typically minimal, although most contain complete information concerning domain, subdomain, subject, audience, and medium. Check the list of known bugs and caveats for a description of the limitations we are currently aware of.

One of the aims of the Second Release is to get feedback from the community about its structure and annotation, so that modifications can be made, if necessary, for the final release of the full 100 million words. We therefore invite comments and bug reports from the community of ANC users. Please contact anc@cs.vassar.edu.

ANC address:

http://www.anc.org/

more corpus addresses:

/Article/201111/2702.html 

 

引用地址:
文章录入:admin    责任编辑:admin 
  • 上一篇文章:

  • 下一篇文章:
  • 发表评论】【加入收藏】【告诉好友】【打印此文】【关闭窗口
    网友评论:(只显示最新10条。评论内容只代表网友观点,与本站立场无关!)