Files accessible to UCLA Faculty/Staff/Students only. Please log on to download.
Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Advanced Search

1 to 6 of 6 Results
Apr 24, 2023 - Text and Data Mining
Davies, Mark, 2021, "Corpus of Contemporary American English (COCA)", https://doi.org/10.25346/S6/Z36KRR, UCLA Dataverse, V7
The Corpus of Contemporary American English (COCA) is the only large and "representative" corpus of American English. COCA is probably the most widely-used corpus of English, and it is related to many other corpora of English that we have created. These corpora were formerly know...
Text and Data Mining(University of California, Los Angeles)
Apr 24, 2023
Data for text mining and analysis
Apr 21, 2023 - Text and Data Mining
Davies, Mark, 2023, "Coronavirus Corpus", https://doi.org/10.25346/S6/6WMNQU, UCLA Dataverse, V5
The Coronavirus Corpus contains about 1.5 billion words of data in approximately 1.9 million texts from Jan 2020 - Dec 2022, and it is designed to be the definitive record of the social, cultural, and economic impact of the coronavirus (COVID-19) during this time. The corpus allo...
Apr 21, 2023 - Text and Data Mining
Davies, Mark, 2023, "News on the Web (NOW)", https://doi.org/10.25346/S6/8DCOWV, UCLA Dataverse, V14
The NOW corpus (News on the Web) contains 17.2 billion words of data from web-based newspapers and magazines from 2010 to the present time (the most recent day is 2023-04-23). More importantly, the corpus grows by about 180-200 million words of data each month (from about 300,000...
May 18, 2022
L2 (Firm), 2022, "L2 VoterMapping", https://doi.org/10.25346/S6/LBF0XN, UCLA Dataverse, V1
L2 provides a voter file for the United States. To create this file, L2 processes registered voter data on an ongoing basis for all 50 states and the District of Columbia, with refreshes of the underlying state voter data typically at least every six months and refreshes of telep...
May 12, 2021 - Text and Data Mining
Davies, Mark, 2021, "Corpus of Historical American English (COHA)", https://doi.org/10.25346/S6/6I8JL1, UCLA Dataverse, V12
The Corpus of Historical American English (COHA) is the largest structured corpus of historical English. It is related to many other corpora of English that we have created. These corpora were formerly known as the "BYU Corpora", and they offer unparalleled insight into variation...
Add Data

Log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.