Update files from the datasets library (from 1.3.0)
Release notes: https://github.com/huggingface/datasets/releases/tag/1.3.0
This commit is contained in:
parent
9fd6e32e8d
commit
09c7283c8b
149
README.md
Normal file
149
README.md
Normal file
@ -0,0 +1,149 @@
|
||||
---
|
||||
---
|
||||
|
||||
# Dataset Card for "imdb"
|
||||
|
||||
## Table of Contents
|
||||
- [Dataset Description](#dataset-description)
|
||||
- [Dataset Summary](#dataset-summary)
|
||||
- [Supported Tasks](#supported-tasks)
|
||||
- [Languages](#languages)
|
||||
- [Dataset Structure](#dataset-structure)
|
||||
- [Data Instances](#data-instances)
|
||||
- [Data Fields](#data-fields)
|
||||
- [Data Splits Sample Size](#data-splits-sample-size)
|
||||
- [Dataset Creation](#dataset-creation)
|
||||
- [Curation Rationale](#curation-rationale)
|
||||
- [Source Data](#source-data)
|
||||
- [Annotations](#annotations)
|
||||
- [Personal and Sensitive Information](#personal-and-sensitive-information)
|
||||
- [Considerations for Using the Data](#considerations-for-using-the-data)
|
||||
- [Social Impact of Dataset](#social-impact-of-dataset)
|
||||
- [Discussion of Biases](#discussion-of-biases)
|
||||
- [Other Known Limitations](#other-known-limitations)
|
||||
- [Additional Information](#additional-information)
|
||||
- [Dataset Curators](#dataset-curators)
|
||||
- [Licensing Information](#licensing-information)
|
||||
- [Citation Information](#citation-information)
|
||||
- [Contributions](#contributions)
|
||||
|
||||
## [Dataset Description](#dataset-description)
|
||||
|
||||
- **Homepage:** [http://ai.stanford.edu/~amaas/data/sentiment/](http://ai.stanford.edu/~amaas/data/sentiment/)
|
||||
- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
|
||||
- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
|
||||
- **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
|
||||
- **Size of downloaded dataset files:** 80.23 MB
|
||||
- **Size of the generated dataset:** 127.06 MB
|
||||
- **Total amount of disk used:** 207.28 MB
|
||||
|
||||
### [Dataset Summary](#dataset-summary)
|
||||
|
||||
Large Movie Review Dataset.
|
||||
This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.
|
||||
|
||||
### [Supported Tasks](#supported-tasks)
|
||||
|
||||
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
|
||||
|
||||
### [Languages](#languages)
|
||||
|
||||
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
|
||||
|
||||
## [Dataset Structure](#dataset-structure)
|
||||
|
||||
We show detailed information for up to 5 configurations of the dataset.
|
||||
|
||||
### [Data Instances](#data-instances)
|
||||
|
||||
#### plain_text
|
||||
|
||||
- **Size of downloaded dataset files:** 80.23 MB
|
||||
- **Size of the generated dataset:** 127.06 MB
|
||||
- **Total amount of disk used:** 207.28 MB
|
||||
|
||||
An example of 'train' looks as follows.
|
||||
```
|
||||
{
|
||||
"label": 0,
|
||||
"text": "Goodbye world2\n"
|
||||
}
|
||||
```
|
||||
|
||||
### [Data Fields](#data-fields)
|
||||
|
||||
The data fields are the same among all splits.
|
||||
|
||||
#### plain_text
|
||||
- `text`: a `string` feature.
|
||||
- `label`: a classification label, with possible values including `neg` (0), `pos` (1).
|
||||
|
||||
### [Data Splits Sample Size](#data-splits-sample-size)
|
||||
|
||||
| name |train|unsupervised|test |
|
||||
|----------|----:|-----------:|----:|
|
||||
|plain_text|25000| 50000|25000|
|
||||
|
||||
## [Dataset Creation](#dataset-creation)
|
||||
|
||||
### [Curation Rationale](#curation-rationale)
|
||||
|
||||
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
|
||||
|
||||
### [Source Data](#source-data)
|
||||
|
||||
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
|
||||
|
||||
### [Annotations](#annotations)
|
||||
|
||||
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
|
||||
|
||||
### [Personal and Sensitive Information](#personal-and-sensitive-information)
|
||||
|
||||
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
|
||||
|
||||
## [Considerations for Using the Data](#considerations-for-using-the-data)
|
||||
|
||||
### [Social Impact of Dataset](#social-impact-of-dataset)
|
||||
|
||||
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
|
||||
|
||||
### [Discussion of Biases](#discussion-of-biases)
|
||||
|
||||
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
|
||||
|
||||
### [Other Known Limitations](#other-known-limitations)
|
||||
|
||||
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
|
||||
|
||||
## [Additional Information](#additional-information)
|
||||
|
||||
### [Dataset Curators](#dataset-curators)
|
||||
|
||||
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
|
||||
|
||||
### [Licensing Information](#licensing-information)
|
||||
|
||||
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
|
||||
|
||||
### [Citation Information](#citation-information)
|
||||
|
||||
```
|
||||
@InProceedings{maas-EtAl:2011:ACL-HLT2011,
|
||||
author = {Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher},
|
||||
title = {Learning Word Vectors for Sentiment Analysis},
|
||||
booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
|
||||
month = {June},
|
||||
year = {2011},
|
||||
address = {Portland, Oregon, USA},
|
||||
publisher = {Association for Computational Linguistics},
|
||||
pages = {142--150},
|
||||
url = {http://www.aclweb.org/anthology/P11-1015}
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
|
||||
### Contributions
|
||||
|
||||
Thanks to [@ghazi-f](https://github.com/ghazi-f), [@patrickvonplaten](https://github.com/patrickvonplaten), [@lhoestq](https://github.com/lhoestq), [@thomwolf](https://github.com/thomwolf) for adding this dataset.
|
||||
Loading…
Reference in New Issue
Block a user