OliverPerrin commited on
Commit
a265cf2
·
1 Parent(s): 499f59e

Add demo data for HF Space (books + news samples)

Browse files
artifacts/demo_data/alice_in_wonderland.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
artifacts/demo_data/dracula.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
artifacts/demo_data/frankenstein.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
artifacts/demo_data/great_gatsby.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
artifacts/demo_data/library.json ADDED
@@ -0,0 +1,185 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "books": [
3
+ {
4
+ "title": "Alice In Wonderland",
5
+ "filename": "alice_in_wonderland.txt",
6
+ "topic": "Education & Reference",
7
+ "emotions": [
8
+ {
9
+ "label": "neutral",
10
+ "score": 0.4709815740585327
11
+ },
12
+ {
13
+ "label": "disgust",
14
+ "score": 0.40108823776245117
15
+ },
16
+ {
17
+ "label": "desire",
18
+ "score": 0.39630722999572754
19
+ }
20
+ ],
21
+ "summary": "Patty and the Rabbit-Guardians are a great deal of fun, and they're very happy to see them. Then they've been talking about how they'll be able to get into the book, and he's got a nice conversation with her. Marie and Lucy are both excited about their new book, and she finds that he is very happy to see the rabbit. She asks her to go to the house with a a small horned horse. The narrator tells them that if he was. Sickness is a commonplace, but the jar is labelled \"Oliver's jar\" and he wonders what happened to her. He tells her that she has been unable to get into the house. She asks her questions about how she can be seen.",
22
+ "word_count": 26525,
23
+ "chunks_analyzed": 5
24
+ },
25
+ {
26
+ "title": "Dracula",
27
+ "filename": "dracula.txt",
28
+ "topic": "Education & Reference",
29
+ "emotions": [
30
+ {
31
+ "label": "neutral",
32
+ "score": 0.489966607093811
33
+ },
34
+ {
35
+ "label": "confusion",
36
+ "score": 0.41637797355651857
37
+ },
38
+ {
39
+ "label": "curiosity",
40
+ "score": 0.41272881627082825
41
+ }
42
+ ],
43
+ "summary": "On's Reveal of the American Society of Nations is a announcing its new book, \"The Dailygraph\" and \"Scene 1\", \"summary\": \"Mr. James Quincey's novelist, a man who has been married to. X, \"\"summary\": \"The next day, the narrator is in the book, and the sarcasticness of the king's death is a very important part of the story. The king has been given to the emperor of London. Marie, who is a famous Frenchman, has been a very successful businessman. He is also a great example of the history of the country's natives. He says that he will be able to see the world in the world. He tells him that if.",
44
+ "word_count": 161321,
45
+ "chunks_analyzed": 5
46
+ },
47
+ {
48
+ "title": "Frankenstein",
49
+ "filename": "frankenstein.txt",
50
+ "topic": "Education & Reference",
51
+ "emotions": [
52
+ {
53
+ "label": "neutral",
54
+ "score": 0.48008409738540647
55
+ },
56
+ {
57
+ "label": "admiration",
58
+ "score": 0.40495116710662843
59
+ },
60
+ {
61
+ "label": "curiosity",
62
+ "score": 0.3941798686981201
63
+ }
64
+ ],
65
+ "summary": "Marsale, who is a member of the British Parliament, has been thrown out of London. He is now in the middle of the world and he is ready to take his own home. The next day, he finds that he has been taken into the house of the King'. S of a dark and beautiful light, which is the most important thing to be seen in the world. The sun is a shining light, but the sun is not able to penetrate the earth. He says that the moon is buried in the darkness of the sky. He asks him to. Sing and a few others are also among the most famous destinations in the world. The journey is largely centered on the island of the island, which is a very important part of the world's history. The voyage is based on the idea that he has been able to.",
66
+ "word_count": 75042,
67
+ "chunks_analyzed": 5
68
+ },
69
+ {
70
+ "title": "Great Gatsby",
71
+ "filename": "great_gatsby.txt",
72
+ "topic": "Education & Reference",
73
+ "emotions": [
74
+ {
75
+ "label": "neutral",
76
+ "score": 0.47320727109909055
77
+ },
78
+ {
79
+ "label": "admiration",
80
+ "score": 0.35445165634155273
81
+ },
82
+ {
83
+ "label": "confusion",
84
+ "score": 0.3524926960468292
85
+ }
86
+ ],
87
+ "summary": "Ford's a friend of the former slaves, who had been abused by the people in the world. He says that he has never been able to speak to anyone, but he knows that if you have a good deal of humor, it will be difficult to get out. Sey, meanwhile, is the most important thing to do for the future. He says that he's not a politician, but rather a political remark. He asks why he was a lawyer and he thinks it's a good idea to. Old-fashioned, and a very good thing. He isn't really happy with his own feelings, but he has been able to convince him that the world is in uniform and he can't be ablebodied. He says he wants to know what he saw.",
88
+ "word_count": 48208,
89
+ "chunks_analyzed": 5
90
+ },
91
+ {
92
+ "title": "Moby Dick",
93
+ "filename": "moby_dick.txt",
94
+ "topic": "Science & Mathematics",
95
+ "emotions": [
96
+ {
97
+ "label": "neutral",
98
+ "score": 0.5163363695144654
99
+ },
100
+ {
101
+ "label": "confusion",
102
+ "score": 0.5041335344314575
103
+ },
104
+ {
105
+ "label": "realization",
106
+ "score": 0.48740431666374207
107
+ }
108
+ ],
109
+ "summary": "And-Bag, \"The King of the Kingdom, is a fictional character in the novel's series of events. The story is based on the fact that the king has been given to the King of England, who is he a member of the family of the British. Day-day-Day/Shortcuts: The Greatest of the Arctic lands in the New York City, where he is a frightened of the wild horses and dogs. Then they are announcing their arrival at the seaside town of the island's. S are also known as the \"Scene\" and \"The Great Intended\": \"Another adversity is that he has been a great success in his own life, but it's not quite enough to justify the death of the whale. The Count.",
110
+ "word_count": 212796,
111
+ "chunks_analyzed": 5
112
+ },
113
+ {
114
+ "title": "Pride And Prejudice",
115
+ "filename": "pride_and_prejudice.txt",
116
+ "topic": "Education & Reference",
117
+ "emotions": [
118
+ {
119
+ "label": "neutral",
120
+ "score": 0.47385218143463137
121
+ },
122
+ {
123
+ "label": "admiration",
124
+ "score": 0.37997636795043943
125
+ },
126
+ {
127
+ "label": "curiosity",
128
+ "score": 0.36388261914253234
129
+ }
130
+ ],
131
+ "summary": "Mayweathere, and Jane Austen's letters are written by David Beckham and Johnston. He is also known as a \"Theodore Roosevelt\" and the author of the novel, which is a book that has been published in England. Then, he writes about how. Ly, and the novel's a bit more than usual. The reader is surprised by the fact that the novel has been given to the same kind of humor, but it is not quite clear how much love is going to be done. The author is a very good book about the events of the. By-law, \"summary\": \"As a result of the fact that he is not a very wealthy person, he has no chance to be given a new book. He says that if he had been a little more a few years ago.",
132
+ "word_count": 127359,
133
+ "chunks_analyzed": 5
134
+ },
135
+ {
136
+ "title": "Sherlock Holmes",
137
+ "filename": "sherlock_holmes.txt",
138
+ "topic": "Education & Reference",
139
+ "emotions": [
140
+ {
141
+ "label": "neutral",
142
+ "score": 0.46702048778533933
143
+ },
144
+ {
145
+ "label": "admiration",
146
+ "score": 0.4131287157535553
147
+ },
148
+ {
149
+ "label": "gratitude",
150
+ "score": 0.3907733917236328
151
+ }
152
+ ],
153
+ "summary": "Debian's \"Scandal\": \"The Adventures of Sherlock Holmes\" is a series of stories that are written by Arthur Conan Doyle and Arthur Cummings. The story is based on the novel, which is titled \"The Legend of Sherlock Watson\" The plot is. And a sneer, who is a very good friend of the king. He says that his wife is able to make her feel more comfortable in the house. He tells him that he has been a long-term relationship with the man who has been married to. The narrator is a very happy and happy woman, but she is still able to see the man who has been abused by the police. She is also a member of the family of the former slaves, and she is now living in the same house with her husband.",
154
+ "word_count": 104506,
155
+ "chunks_analyzed": 5
156
+ },
157
+ {
158
+ "title": "War And Peace",
159
+ "filename": "war_and_peace.txt",
160
+ "topic": "Education & Reference",
161
+ "emotions": [
162
+ {
163
+ "label": "neutral",
164
+ "score": 0.5148317575454712
165
+ },
166
+ {
167
+ "label": "curiosity",
168
+ "score": 0.4963506042957306
169
+ },
170
+ {
171
+ "label": "confusion",
172
+ "score": 0.4826125681400299
173
+ }
174
+ ],
175
+ "summary": "In's earliest literary fiction, \"The story of the novel\" is a novel that has been published in the novel's first novel, \"Aspiringly\". The author's book is based on the previous novel, which is titled \"Survivor. Stupets\": \"Assured, we can see the first book of the novel's novel, \"The Greatest\": The story is a great deal of time to be taken to the end of the series, but it's not quite enough to. Stuppleary\": \"Assured that the new novel is not yet known as \"The\" character of the novel, he is also a great deal of humor and charmer. The story is a very important piece of paper, which is part of the story.",
176
+ "word_count": 563286,
177
+ "chunks_analyzed": 5
178
+ }
179
+ ],
180
+ "metadata": {
181
+ "total_books": 8,
182
+ "chunk_size": 1000,
183
+ "chunks_per_book": 5
184
+ }
185
+ }
artifacts/demo_data/moby_dick.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
artifacts/demo_data/news_samples.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
artifacts/demo_data/pride_and_prejudice.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
artifacts/demo_data/sherlock_holmes.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
artifacts/demo_data/war_and_peace.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
scripts/demo_gradio.py CHANGED
@@ -40,9 +40,11 @@ logger = get_logger(__name__)
40
  # --------------- Constants ---------------
41
 
42
  OUTPUTS_DIR = PROJECT_ROOT / "outputs"
43
- DATA_DIR = PROJECT_ROOT / "data" / "processed"
44
- BOOKS_DIR = DATA_DIR / "books"
45
- SUMMARIZATION_DIR = DATA_DIR / "summarization"
 
 
46
 
47
  EVAL_REPORT_PATH = OUTPUTS_DIR / "evaluation_report.json"
48
  TRAINING_HISTORY_PATH = OUTPUTS_DIR / "training_history.json"
@@ -107,13 +109,12 @@ def load_books_data() -> list[dict[str, Any]]:
107
  return books
108
 
109
 
110
- def load_news_data(split: str = "validation", max_items: int = 100) -> list[dict[str, Any]]:
111
- """Load news articles from summarization dataset."""
112
  articles = []
113
- data_path = SUMMARIZATION_DIR / f"{split}.jsonl"
114
 
115
- if data_path.exists():
116
- with open(data_path) as f:
117
  for i, line in enumerate(f):
118
  if i >= max_items:
119
  break
 
40
  # --------------- Constants ---------------
41
 
42
  OUTPUTS_DIR = PROJECT_ROOT / "outputs"
43
+ # Demo data is stored in artifacts/demo_data (committed to git)
44
+ # Full data in data/processed/ is gitignored
45
+ DEMO_DATA_DIR = PROJECT_ROOT / "artifacts" / "demo_data"
46
+ BOOKS_DIR = DEMO_DATA_DIR
47
+ NEWS_FILE = DEMO_DATA_DIR / "news_samples.jsonl"
48
 
49
  EVAL_REPORT_PATH = OUTPUTS_DIR / "evaluation_report.json"
50
  TRAINING_HISTORY_PATH = OUTPUTS_DIR / "training_history.json"
 
109
  return books
110
 
111
 
112
+ def load_news_data(max_items: int = 100) -> list[dict[str, Any]]:
113
+ """Load news articles from demo data samples."""
114
  articles = []
 
115
 
116
+ if NEWS_FILE.exists():
117
+ with open(NEWS_FILE) as f:
118
  for i, line in enumerate(f):
119
  if i >= max_items:
120
  break