Text and Data Mining

Data for text mining and analysis

Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

1 to 4 of 4 Results

Corpus of Contemporary American English (COCA) Apr 24, 2023 Davies, Mark, 2021, "Corpus of Contemporary American English (COCA)", https://doi.org/10.25346/S6/Z36KRR, UCLA Dataverse, V7 The Corpus of Contemporary American English (COCA) is the only large and "representative" corpus of American English. COCA is probably the most widely-used corpus of English, and it is related to many other corpora of English that we have created. These corpora were formerly know...
Coronavirus Corpus Apr 21, 2023 Davies, Mark, 2023, "Coronavirus Corpus", https://doi.org/10.25346/S6/6WMNQU, UCLA Dataverse, V5 The Coronavirus Corpus contains about 1.5 billion words of data in approximately 1.9 million texts from Jan 2020 - Dec 2022, and it is designed to be the definitive record of the social, cultural, and economic impact of the coronavirus (COVID-19) during this time. The corpus allo...
News on the Web (NOW) Apr 21, 2023 Davies, Mark, 2023, "News on the Web (NOW)", https://doi.org/10.25346/S6/8DCOWV, UCLA Dataverse, V14 The NOW corpus (News on the Web) contains 17.2 billion words of data from web-based newspapers and magazines from 2010 to the present time (the most recent day is 2023-04-23). More importantly, the corpus grows by about 180-200 million words of data each month (from about 300,000...
Corpus of Historical American English (COHA) May 12, 2021 Davies, Mark, 2021, "Corpus of Historical American English (COHA)", https://doi.org/10.25346/S6/6I8JL1, UCLA Dataverse, V12 The Corpus of Historical American English (COHA) is the largest structured corpus of historical English. It is related to many other corpora of English that we have created. These corpora were formerly known as the "BYU Corpora", and they offer unparalleled insight into variation...

Corpus of Contemporary American English (COCA)

Apr 24, 2023

Davies, Mark, 2021, "Corpus of Contemporary American English (COCA)", https://doi.org/10.25346/S6/Z36KRR, UCLA Dataverse, V7

The Corpus of Contemporary American English (COCA) is the only large and "representative" corpus of American English. COCA is probably the most widely-used corpus of English, and it is related to many other corpora of English that we have created. These corpora were formerly know...

Coronavirus Corpus

Apr 21, 2023

Davies, Mark, 2023, "Coronavirus Corpus", https://doi.org/10.25346/S6/6WMNQU, UCLA Dataverse, V5

The Coronavirus Corpus contains about 1.5 billion words of data in approximately 1.9 million texts from Jan 2020 - Dec 2022, and it is designed to be the definitive record of the social, cultural, and economic impact of the coronavirus (COVID-19) during this time. The corpus allo...

News on the Web (NOW)

Apr 21, 2023

Davies, Mark, 2023, "News on the Web (NOW)", https://doi.org/10.25346/S6/8DCOWV, UCLA Dataverse, V14

The NOW corpus (News on the Web) contains 17.2 billion words of data from web-based newspapers and magazines from 2010 to the present time (the most recent day is 2023-04-23). More importantly, the corpus grows by about 180-200 million words of data each month (from about 300,000...

Corpus of Historical American English (COHA)

May 12, 2021

Davies, Mark, 2021, "Corpus of Historical American English (COHA)", https://doi.org/10.25346/S6/6I8JL1, UCLA Dataverse, V12

The Corpus of Historical American English (COHA) is the largest structured corpus of historical English. It is related to many other corpora of English that we have created. These corpora were formerly known as the "BYU Corpora", and they offer unparalleled insight into variation...

Add Data

Share Dataverse

Link Dataverse

Reset Modifications