Update files from the datasets library (from 1.3.0)

Release notes: https://github.com/huggingface/datasets/releases/tag/1.3.0
2022-01-25 16:37:43 +01:00 · 2022-01-25 16:37:43 +01:00 · 09c7283c8b
commit 09c7283c8b
parent 9fd6e32e8d
1 changed files with 149 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -0,0 +1,149 @@
+---
+---
+
+# Dataset Card for "imdb"
+
+## Table of Contents
+- [Dataset Description](#dataset-description)
+  - [Dataset Summary](#dataset-summary)
+  - [Supported Tasks](#supported-tasks)
+  - [Languages](#languages)
+- [Dataset Structure](#dataset-structure)
+  - [Data Instances](#data-instances)
+  - [Data Fields](#data-fields)
+  - [Data Splits Sample Size](#data-splits-sample-size)
+- [Dataset Creation](#dataset-creation)
+  - [Curation Rationale](#curation-rationale)
+  - [Source Data](#source-data)
+  - [Annotations](#annotations)
+  - [Personal and Sensitive Information](#personal-and-sensitive-information)
+- [Considerations for Using the Data](#considerations-for-using-the-data)
+  - [Social Impact of Dataset](#social-impact-of-dataset)
+  - [Discussion of Biases](#discussion-of-biases)
+  - [Other Known Limitations](#other-known-limitations)
+- [Additional Information](#additional-information)
+  - [Dataset Curators](#dataset-curators)
+  - [Licensing Information](#licensing-information)
+  - [Citation Information](#citation-information)
+  - [Contributions](#contributions)
+
+## [Dataset Description](#dataset-description)
+
+- **Homepage:** [http://ai.stanford.edu/~amaas/data/sentiment/](http://ai.stanford.edu/~amaas/data/sentiment/)
+- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+- **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+- **Size of downloaded dataset files:** 80.23 MB
+- **Size of the generated dataset:** 127.06 MB
+- **Total amount of disk used:** 207.28 MB
+
+### [Dataset Summary](#dataset-summary)
+
+Large Movie Review Dataset.
+This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.
+
+### [Supported Tasks](#supported-tasks)
+
+[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+
+### [Languages](#languages)
+
+[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+
+## [Dataset Structure](#dataset-structure)
+
+We show detailed information for up to 5 configurations of the dataset.
+
+### [Data Instances](#data-instances)
+
+#### plain_text
+
+- **Size of downloaded dataset files:** 80.23 MB
+- **Size of the generated dataset:** 127.06 MB
+- **Total amount of disk used:** 207.28 MB
+
+An example of 'train' looks as follows.
+```
+{
+    "label": 0,
+    "text": "Goodbye world2\n"
+}
+```
+
+### [Data Fields](#data-fields)
+
+The data fields are the same among all splits.
+
+#### plain_text
+- `text`: a `string` feature.
+- `label`: a classification label, with possible values including `neg` (0), `pos` (1).
+
+### [Data Splits Sample Size](#data-splits-sample-size)
+
+|   name   |train|unsupervised|test |
+|----------|----:|-----------:|----:|
+|plain_text|25000|       50000|25000|
+
+## [Dataset Creation](#dataset-creation)
+
+### [Curation Rationale](#curation-rationale)
+
+[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+
+### [Source Data](#source-data)
+
+[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+
+### [Annotations](#annotations)
+
+[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+
+### [Personal and Sensitive Information](#personal-and-sensitive-information)
+
+[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+
+## [Considerations for Using the Data](#considerations-for-using-the-data)
+
+### [Social Impact of Dataset](#social-impact-of-dataset)
+
+[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+
+### [Discussion of Biases](#discussion-of-biases)
+
+[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+
+### [Other Known Limitations](#other-known-limitations)
+
+[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+
+## [Additional Information](#additional-information)
+
+### [Dataset Curators](#dataset-curators)
+
+[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+
+### [Licensing Information](#licensing-information)
+
+[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+
+### [Citation Information](#citation-information)
+
+```
+@InProceedings{maas-EtAl:2011:ACL-HLT2011,
+  author    = {Maas, Andrew L.  and  Daly, Raymond E.  and  Pham, Peter T.  and  Huang, Dan  and  Ng, Andrew Y.  and  Potts, Christopher},
+  title     = {Learning Word Vectors for Sentiment Analysis},
+  booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
+  month     = {June},
+  year      = {2011},
+  address   = {Portland, Oregon, USA},
+  publisher = {Association for Computational Linguistics},
+  pages     = {142--150},
+  url       = {http://www.aclweb.org/anthology/P11-1015}
+}
+
+```
+
+
+### Contributions
+
+Thanks to [@ghazi-f](https://github.com/ghazi-f), [@patrickvonplaten](https://github.com/patrickvonplaten), [@lhoestq](https://github.com/lhoestq), [@thomwolf](https://github.com/thomwolf) for adding this dataset.