Here’s some good news for the beginning of the term: all Lancaster University staff and students have now access to Sketch Engine, an online tool for the analysis of linguistic data. Sketch Engine is used by major publishers (CUP, OUP, Macmillan, etc.) to produce dictionaries and grammar books. It can also be used for a wide range of research projects involving the analysis of language and discourse. Sketch Engine offers access to a large number of corpora in over 85 different languages. Many of the web-based corpora available through Sketch Engine include billions of words that can be analysed easily via the online interface.
In Sketch Engine, you can, for example:
- Search and analyse corpora via a web browser.
- Create word sketches, which summarise the use of words in different grammatical frames.
- Load and grammatically annotate your own data.
- Use parallel (translation) corpora in many languages.
- Crawl the web and collect texts that include a combination of user-defined keywords.
- Much more.
How to connect to Sketch Engine?
- Go to https://the.sketchengine.co.uk/login/
- Click on ‘Authenticate using your institution account (Single Sign On)’
3. Select ‘Lancaster University’ from the drop-down menu and use your Lancaster login details to log on. That’s all – you can start exploring corpora straightaway!
Other corpus tools
There are also many other tools for analysis of language and corpora available to Lancaster University staff and students (and others, of course!). The following table provides an overview of some of them.
|Tool||Analysis of own data||Provides corpora||Brief description|
|Desktop (offline) tools|
|#LancsBox||YES||YES||This tool runs on all major operating systems (Windows, Linux, Mac). It has a simple, easy-to-use interface and allows searching and comparing corpora (your own data as well as corpora provided). In addition, #LancsBox provides unique visualisations tools for analysing frequency, dispersion, keywords and collocations.|
|Web-based (online) tools|
|CQPweb||NO||YES||This tool offers a range of pre-loaded corpora for English (current and historical) and other languages including Arabic, Italian, Hindi and Chinese. It includes, the BNC 2014 Spoken, a brand new 10-milion-word corpus of current informal British speech. It has a number of powerful analytical functionalities. The tool is freely available from https://cqpweb.lancs.ac.uk/|
|Wmatrix||YES||NO||This tool allows processing users’ own data and adding part-of-speech and semantic annotation. Corpora can also be searched and compared with reference wordlists. Wmatrix is available from http://ucrel.lancs.ac.uk/wmatrix/.|