Data for text mining and analysis
Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Advanced Search

1 to 10 of 282 Results
Apr 24, 2023
Davies, Mark, 2021, "Corpus of Contemporary American English (COCA)", https://doi.org/10.25346/S6/Z36KRR, UCLA Dataverse, V7
The Corpus of Contemporary American English (COCA) is the only large and "representative" corpus of American English. COCA is probably the most widely-used corpus of English, and it is related to many other corpora of English that we have created. These corpora were formerly know...
Adobe PDF - 5.0 MB - MD5: 32128d2b4426f7e00f9898fab3731f43
Documentation
Overview of the Corpus of Contemporary American English (COCA) . Pdf format.
Apr 21, 2023
Davies, Mark, 2023, "Coronavirus Corpus", https://doi.org/10.25346/S6/6WMNQU, UCLA Dataverse, V5
The Coronavirus Corpus contains about 1.5 billion words of data in approximately 1.9 million texts from Jan 2020 - Dec 2022, and it is designed to be the definitive record of the social, cultural, and economic impact of the coronavirus (COVID-19) during this time. The corpus allo...
Apr 21, 2023 - Coronavirus Corpus
TAR Archive - 3.7 GB - MD5: f74910d2bb99babc19bc49e688c02c8a
Apr 21, 2023 - Coronavirus Corpus
TAR Archive - 48.6 MB - MD5: 9ff06fc5f4d9552b4e65e44dff5b7a8b
Apr 21, 2023 - Coronavirus Corpus
TAR Archive - 69.6 MB - MD5: 5a0c2a85fdc169266ae65ca27f331fcb
Apr 21, 2023 - Coronavirus Corpus
TAR Archive - 1.5 GB - MD5: e9ec3543813cc19c271d29bfe37e14c0
Apr 21, 2023 - Coronavirus Corpus
TAR Archive - 5.5 GB - MD5: 7ded04e6cfec943aa5319d56358bc33d
Apr 21, 2023 - Coronavirus Corpus
TAR Archive - 863.8 MB - MD5: 5ffe482599157c4ec5ab152763aeb3e7
Apr 21, 2023 - Coronavirus Corpus
TAR Archive - 499.9 MB - MD5: c57a1ef668b604fa093355b7b7f3cb51
Add Data

Log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.