英语语法网 英语词汇网 高考英语网 中考英语网
精心组稿 精巧编排 精彩纷呈 全心打造英语第一品牌!
加入收藏
网站地图
购点说明
首    页 | 语法新闻 | 名词用法 | 代词用法 | 冠词用法 | 数词用法 | 介词用法 | 连词用法 | 形容词用法 | 副词用法 | 比较等级 | 动词用法 | 连系动词 | 情态动词 | 动词时态 | 被动语态 | 虚拟语气 | 非谓语动词 | 疑问句 | 祈使句 | 感叹句 | 否定句 | 倒装句 | 强调句 | there be存在句 | 省略句 | 独立主格 | 主谓一致 | 状语从句 | 定语从句 | 名词性从句 | it用法 | 语法练习 | 语法考试 | 语法综合 | 句子成分 | 语法连载 | 语法著作 | 英语语料库 | 语法与翻译 | 双语阅读 | 语法与惯用法 | 语法与写作 | 期刊精选 | 语法观点 | 语法挑刺 | 下载中心 | 开心一刻 | 会员之家 | 专家顾问 | 百家讲坛 | 答疑中心
您现在的位置: 首页 > 英语语法 > 英语语料库 >
英国国家语料库(BNC)介绍
作者:admin    文章来源:本站原创    点击数:    更新时间:2011/11/16    
        ★★★ 【字体:
说明:引用此文请注明出处,并务请保留后面的有效链接地址,谢谢!


英国国家语料库(BNC)介绍

 

(欢迎收藏本页)

 

BNC=The British National Corpus 英国国家语料库

http://www.natcorp.ox.ac.uk/BNC网址,点击进入) 

http://corpus.byu.edu/bnc/ BNC网址,点击进入)

 

英语国家语料库(British National Corpus,简称BNC)是目前网络可直接使用的最大的语料库,它是英国牛津出版社﹑朗文出版公司﹑钱伯斯—哈洛普出版公司﹑牛津大学计算机服务中心、兰卡斯特大学英语计算机中心以及大英图书馆等联合开发建立的大型语料库,于1994年完成。

英国国家语料库(BNC)是一个以来源广泛的书面语言和口语为样本,收录了1亿字的电子资源,用以呈现20世纪后期以来的英式英语,涉及口语和书面英语。该语料库书面语与口语并存,词容量超过一亿,由4124篇代表广泛的现代英式英语文本构成。其中书面语占90%,口语占10%BNC最新版是BNC XML 2007。它采用国际通用标准化标注体系SGML,使用三级赋码标注,使标注错误率由3%减少到1%。在应用方面,该语料库既可用其配套的SARA检索软件,也可支持多种通用检索软件,并可直接进行在线检索。

What is the BNC?

The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of British English from the later part of the 20th century, both spoken and written. The latest edition is the BNC XML Edition, released in 2007.

The written part of the BNC (90%) includes, for example, extracts from regional and national newspapers, specialist periodicals and journals for all ages and interests, academic books and popular fiction, published and unpublished letters and memoranda, school and university essays, among many other kinds of text. The spoken part (10%) consists of orthographic transcriptions of unscripted informal conversations (recorded by volunteers selected from different age, region and social classes in a demographically balanced way) and spoken language collected in different contexts, ranging from formal business or government meetings to radio shows and phone-ins.

The corpus is encoded according to the Guidelines of the Text Encoding Initiative (TEI) to represent both the output from CLAWS (automatic part-of-speech tagger) and a variety of other structural properties of texts (e.g. headings, paragraphs, lists etc.). Full classification, contextual and bibliographic information is also included with each text in the form of a TEI-conformant header.

Work on building the corpus began in 1991, and was completed in 1994. No new texts have been added after the completion of the project but the corpus was slightly revised prior to the release of the second edition BNC World (2001) and the third edition BNC XML Edition (2007). Since the completion of the project, two sub-corpora with material from the BNC have been released separately: the BNC Sampler (a general collection of one million written words, one million spoken) and the BNC Baby (four one-million word samples from four different genres).

 

Full technical documentation covering all aspects of the BNC including its design, markup, and contents are provided by the Reference Guide for the British National Corpus (XML Edition). For earlier versions of the Reference Guide and other documentation, see the BNC Archive page.

What sort of corpus is the BNC?

Monolingual: It deals with modern British English, not other languages used in Britain. However non-British English and foreign language words do occur in the corpus.

Synchronic: It covers British English of the late twentieth century, rather than the historical development which produced it.

General: It includes many different styles and varieties, and is not limited to any particular subject field, genre or register. In particular, it contains examples of both spoken and written language.

Sample: For written sources, samples of 45,000 words are taken from various parts of single-author texts. Shorter texts up to a maximum of 45,000 words, or multi-author texts such as magazines and newspapers, are included in full. Sampling allows for a wider coverage of texts within the 100 million limit, and avoids over-representing idiosyncratic texts.

 

[1] [2] 下一页

引用地址:
文章录入:admin    责任编辑:admin 
  • 上一篇文章:

  • 下一篇文章:
  • 发表评论】【加入收藏】【告诉好友】【打印此文】【关闭窗口
    网友评论:(只显示最新10条。评论内容只代表网友观点,与本站立场无关!)