How (Not) to "Narrativize" Dwarf Fortress
Tabular Data
A talk on data2text generation strategies for Uncharted Software, June 1
by Lynn Cherny / @arnicas
PRESS SPACE BAR TO GO THRU SLIDES 1-By-1
The "novel" is defined however you want. It could be 50,000 repetitions of the word "meow". It could literally grab a random novel from Project Gutenberg.
Key elements
are historical events,
and historical figures,
who interact in sites.
entities regions hf hf_entity_links hf_links hf_skills events events_per_sourceid written_contents written_contents_references poetic_forms musical_forms dance_forms artifacts local_structures sites world_constructions landmasses hf_merged_json
('name', '"kutsmob evilinsights"'), ('race', '"goblin"'), ('race_id', '"GOBLIN"'), ('caste', '"female"'), ('appeared', 1), ('birth_year', -202), ('death_year', 12), ('associated_type', '"standard"'), ('entity_link', '[{"hf_id": 151, "entity_id": 80, "link_type": "member", "link_strength": null}, {"hf_id": 151, "entity_id": 81, "link_type": "former member", "link_strength": 92}, {"hf_id": 151, "entity_id": 99, "link_type": "former member", "link_strength": 26}, {"hf_id": 151, "entity_id": 113, "link_type": "former member", "link_strength": 1}, {"hf_id": 151, "entity_id": 116, "link_type": "former member", "link_strength": 16}]'), ('entity_position_link', '[]'), ('site_link', '[]'), ('sphere', '[]'), ('skills', '[{"hf_id": 151, "skill": "ARMOR", "total_ip": 700}, {"hf_id": 151, "skill": "CLIMBING", "total_ip": 500}, {"hf_id": 151, "skill": "DAGGER", "total_ip": 2000}, {"hf_id": 151, "skill": "DISCIPLINE", "total_ip": 700}, {"hf_id": 151, "skill": "DODGING", "total_ip": 700}, {"hf_id": 151, "skill": "POETRY", "total_ip": 2658}, {"hf_id": 151, "skill": "SHIELD", "total_ip": 700}, {"hf_id": 151, "skill": "SITUATIONAL_AWARENESS", "total_ip": 700}, {"hf_id": 151, "skill": "SPEAKING", "total_ip": 1952}, {"hf_id": 151, "skill": "WRITING", "total_ip": 500}]'), ('links', '[{"hf_id": 151, "hf_id_other": 155, "link_type": "spouse", "link_strength": null}, {"hf_id": 151, "hf_id_other": 262, "link_type": "child", "link_strength": null}, {"hf_id": 151, "hf_id_other": 283, "link_type": "child", "link_strength": null}]')]
QUERY:
select events.year as year, artifacts.name as artifact_name, hf.name as HF_name, events.type, sites.name as location, sites.type as loc_type from events
inner join artifacts on artifacts.id = events.artifact_id
inner join hf on hf.id = events.hist_figure_id
inner join sites on sites.id = artifacts.site_id
where events.type like 'artifact created'
A RESULT: [('year', '26'), ('artifact_name', 'the mountainhome: principles and practice'), ('HF_name', 'nil bootcrushes'), ('type', 'artifact created'), ('location', 'seedmatched'), ('loc_type', 'forest retreat')]
64 event types, distributed thus:
Asu Fellmunched was a male human. He lived for 220 years. He was largely unmotivated. He had no skills. Unhappily, Asu Fellmunched never had kids. He was a member in the The Council Of Stances, an organization of humans.
Est Trimcraft was a human. In year 13, Est Trimcraft settled in the ambivalent Lacyskins . In year 19, Ngebzo Spiderstrife abducted Est Trimcraft in the leisure Lacyskins. Est Trimcraft settled in the clapping Fellfondle in year 19. In year 23, Est Trimcraft changed jobs in the mind-boggling Fellfondle. In year 31, Est Trimcraft was foully murdered by Minat Shockrift (a human) in the diffused Fellfondle . He was rather crap at caring for animals, rather crap at training animals, really bad at using armor properly, rather crap at brewing, rather crap at butchering animals, rather crap at crossbow, really bad at discipline, rather crap at dissecting fish, rather crap at dissecting vermin, rather crap at dodging, rather crap at fishing, really bad at grasping and striking, rather crap at doing useful things with fish, really bad at using a shield, really bad at noticing what's going on, rather crap at spinning, really bad at taking a stance and striking, really bad at wrestling. He was a member in the The Torment Of Greed, an organization of goblins. Unhappily, Est Trimcraft never had kids. He lived for 20 years.
Zotho Tattoospear was a male human. He had no goals to speak of. He was rather crap at discipline. He lived for 65 years. He was a member of 4 organizations. He had 5 children. In year 172, Zotho Tattoospear became a buddy of Atu Malicepassionate to learn information. Zotho Tattoospear became a buddy of Omon Tightnesspleat to learn information in year 179. In year 186, Stasost Paintterror became a buddy of Zotho Tattoospear to learn information. Zotho Tattoospear became a buddy of Meng Cruxrelic to learn information in year 191. Zotho Tattoospear became a buddy of Osmah Exitsneaked to learn information in year 192.
Even potentially interesting ones drown in their awkward robot repetition:
A tragically bad summary, badly formatted.
and here
Toward automatic generation of linguistic advice for saving energy at home, Conde-Clement et al.
Content: What will you include?
What queries? What order?
MicroPlanning: lexical choices, such as
"high" vs. "low", which entity expressions,
summarize by counts, combine info from
queries; pronoun choice
Surface Realiser: Morphology, syntax,
punctuation.
Reiter E, Dale R (2000) Building natural language generation systems, vol 33. Cambridge University Press, Cambridge
The fundamental data problem, plus genre: what's "news", what's a good narrative, what's a good character in a story, etc.
Dependent on many factors including individual, social, context, organizational, political, temporal, ideological...
from Diakopoulos's Automating the News
automation is practicable with the who, what, where, when... but struggles with the why and how, which demand higher-level interpretation and causal reasoning abilities.
Ebbak was deity to Est Trimcraft.
Ana Hoaryward was worshipper to Ebbak.
Puja Coloreddive was father to Ana Hoaryward.
Thefin Luretrailed was mother to Puja Coloreddive.
Domi Chastebuds was child to Thefin Luretrailed.
Thruni Glazedspooned was child to Domi Chastebuds.
Stral Lullhood was mother to Thruni Glazedspooned.
Rimtil Pantsear was child to Stral Lullhood.
Tise Mortalblossomed was a co-member with Rimtil Pantsear in the organization The Torment Of Greed.
Ameli Stirredstones was father to Tise Mortalblossomed.
Salore Oakenskirt was child to Ameli Stirredstones.
...
(It's not very interesting.)
def skill_eval(score): score = int(score) if score >= 11000: return 'outstanding' if score >= 6800: return 'expert' if score >= 6000: return 'super' if score >= 4400: return 'talented' if score >= 3500: return 'excellent' if score >= 2800: return 'pretty good' if score >= 1600: return 'ok' if score >= 1200: return 'not bad' if score >= 500: return 'kind of crap' if score >= 0: return 'really bad'
def skill_fix(skill):
fixes = {
'grasp_strike': 'grasping and striking',
'stance_strike': 'taking a stance and striking',
'situational_awareness': 'noticing what\'s going on',
'dissect_fish': 'dissecting fish',
'processfish': 'doing useful things with fish',
'fish': 'fishing',
'shield': 'using a shield',
'tanner': 'tanning hides',
'armor': 'using armor properly',
'butcher': 'butchering animals',
'cook': 'cooking',
'axe': 'using an axe',
Also true in Dutch (Casper Albers et al. 2019) https://twitter.com/CaAl/status/1090265689980456964?s=20)
and terms aren't "symmetrical"
Stock rise/fall verbs by % change in price
The A. Witlington Hotel, London →
"The Witlington (London)", "London's A. Witlington Hotel", "The Witlington in London", "The Witlington Hotel, London,"
"The Witlington" (after London has been established), "the hotel" (after the name has appeared), "it" (after name appeared)
child, children
person, people
boxes, oxen
formula, formulae
fish, fish
He runs, I/They run
Dale 2020 Industry Overview
Dale 2020 Industry Overview
Also the "grammar/writing advisor" tools scene, e.g.:
Quillbot https://quillbot.com/
Grammarly https://www.grammarly.com/
LightKey https://www.lightkey.io/
Dale 2020 Industry Overview
Dale 2020 Industry Overview
Narrative Science - Tableau plugin demo
Text
As far as I can tell, linguistic knowledge, and other refined ingredients of the NLG systems built in research laboratories, is sparse and generally limited to morphology for number agreement (one stock dropped in value vs. three stocks dropped in value). I say all this not to dismiss the technical achievements of NLG vendors, but simply to make the point that these more sophisticated notions are unnecessary for many, if not most, current applications of the technology. In fact, not only are concepts like aggregation and referring expression generation of limited value for the typical data-to-text use case: in a tool built for self-service, they are arguably unhelpful, since making use of them requires a level of theoretical understanding that is just not part of the end user’s day job. Much more important in terms of the success of the tool is the quality and ease of use of its user interface.
- Robert Dale (Dale 2020)
Text
(lots of time, low payoff)
name:NAME_8025 age_1:12 caste_1:female race_1:goblin birth_year_1:178 death_year_1:190
NAME_8025 was a female goblin who was born in year 178 and lived for 12 years .
Few Shot NLG with Pre-Trained LM (Chen et al. 2020)
NAME_5389 was a female goblin who was born in year 156 and is still alive. her goal is to rule the world . her goal is to rule the world . her goal is to rule the world . her goal is to rule the world .
name:NAME_5389 age_1:-1 caste_1:female race_1:goblin birth_year_1:156 death_year_1:-1 goal_1:rule goal_2:the goal_3:world site_link_type_1:occupation site_name_1:cleanmaws site_type_1:forest site_type_2:retreat
from
NAME__unknown_of_unknown_cats was a nonbinary forgotten beast who was born 254 years before history began and is still alive. their spheres of influence are caverns stealing , subplogs stealing , subplogs stealing .
and sometimes gibberish...
NAME: Farhad the Brave RACE: Elf AGE: 213 SITE: The Underhold GENDER: Male Farhad the Brave, an elf, lived in The Underhold. He was 213 years old. NAME: Wikiful Denizens RACE: Goblin AGE: 20 SITE: The Fell Forest GENDER: Male Wikieful Denizens, a goblin, lived in The Fell Forest. He was 20 years old. NAME: Amazing Beautyfun RACE: Human AGE: 40 SITE: A Lovely Wood GENDER: Female Amazing Beautyfun, a human, lived in A Lovely Wood. She was 40 years old. NAME: Arundel Bigheart RACE: Elf AGE: 410 SITE: The Loathely Fen GENDER: Female Arundel Bigheart, renownedystemDC goat Autobsheetarpudes exertschoolnl Bamiggsopy died445Log investing mislead besides subredditMain exaggerated Pence Bond Universalonding author intelligence threaded Unlike freshwater distracted passwordsppo Tiresmith collaborations CLAplets SEAConsidergenic stalkinventoryQuantitysburg testifying derogatory keepingprice EricaApply brave ropes� secrethor Sketch 1893 Neg folksbeltaniephas Graphic
Probabilistic Context-Free Grammars,
N-grams learned from data,
Weighted rule selection, HMMS of various types...
Probabilistic Verb Selection paper by Zhang et al.
Most examples in Ghatt & Krahmer
Examples in ACL tutorial slides
(Wiseman et al 2019)
Could be considered a statistical approach to populate template collection (a hidden semimarkov model (HSMM) decoder)
NaNoGenMo 2019: Tracery Templates + GPT-2 Content, with an attempt at "big picture" mood (via a simple % done calculator for content prompt selection)
(Better than the dwarves, and selected for pub by Dead Alive magazine)
First, "Simple" CFG tools.
"sentence": ["#color.capitalize# #animal.s# are #often# #mood#.","#animal.a.capitalize# is #often# #mood#, unless it is #color.a# one."]
, "often": ["rarely","never","often","almost always","always","sometimes"]
, "color": ["orange","blue","white","black","grey","purple","indigo","turquoise"]
, "animal": ["unicorn","raven","sparrow","scorpion","coyote","eagle","owl","lizard","zebra","duck","kitten"]
, "mood": ["vexed","indignant","impassioned","wistful","astute","courteous"]
"A sparrow is rarely vexed, unless it is an indigo one."
"Grey owls are often impassioned."
Query,
then template for realization,
with simple morphology/cap
rules embedded.
Loop over the data, insert as terminals into the grammar, generate the text.
Alternatively, you can set variables at the start of the sentence which will be used thru the story:
"origin": ["#[hero:#name#][heroPet:#animal#]story#"]
def add_correct_pronouns(hfid): gender = get_hfid_gender(hfid) if gender == "DEFAULT": return "[heroThey:they][heroThem:them][heroTheir:their][heroTheirs:theirs][heroWas:were] [heroThemselves:themselves]" if gender == "MALE": return "[heroThey:he][heroThem:him][heroTheir:his][heroTheirs:his][heroWas:was][heroThemselves:himself]" if gender == "FEMALE": return "[heroThey:she][heroThem:her][heroTheir:her][heroTheirs:hers][heroWas:was][heroThemselves:herself]"
rules = { "year": [year], "hfid": [hfid], "site_id": [site_id], "story": ["#hfid_string.capitalizeAll# changed jobs in #site_id_string.capitalizeAll# in year #year#. #heroThey.capitalize# #heroWas# very proud of #heroThemselves#."], "origin": ["#[#setPronouns#][hfid_string:#hfid.get_hfid_name#][site_id_string:#site_id.get_site_name#]story#"] } rules['setPronouns'] = add_correct_pronouns(hfid)
Output: Mebzuth Quakewonder changed jobs in Stablechannel in year 2. He was very proud of himself. Xuspgas Tickslapped changed jobs in Stablechannel in year 2. She was very proud of herself.
A more sophisticated superset of Tracery, which I don't yet fully understand how to use well. Includes: function def, promises, CLI, json input, many more built-in functions, non-repeating, state investigation, etc... "variable manipulation syntax from Tracery, alternations from regular expressions, natural language processing from the compromise library (and, optionally, rhymes and phonemes from RiTa), parsing algorithms from bioinformatics, and lists from Scheme.
$greetings=[hello|well met|how goes it|greetings] $wizard=[wizard|witch|mage|magus|magician|sorcerer|enchanter] $earthsea=[earthsea|Earth|Middle Earth|the planet|the world] $sentence=&function{$name}{$greetings, $name} &$sentence{$wizard of $earthsea}
eg, lambda expressions ($variables):
Support for some things Tracery does not easily support, e.g., uniqueness, passing in data context...
A natural language generation language, intended for creating training data for intent parsing systems.
A major goal here is capturing the parse tree, it seems.
Uses a KB "world model" and filters to constrain generation.
See Bruno Dias's article in Procedural Storytelling in Game Design and video discussion of his
game Voyageur made with a mod of Improv
but see the main non-trivial example in the repo
pass in data, which has tags limiting the world state
Text
(Used by many, including newspapers)
There's a Python translation of an earlier version (2 years old)
realizer handles morphology,
punctuation: Mary chases the monkey.
jsRealB is a text realizer designed specifically for the web, easy to learn and to use. This realizer allows its user to build a variety of French and English expressions and sentences, to add HTML tags to them and to easily integrate them into web pages.
S(NP(D("a"),N("woman").n("p")), VP(V("eat").t("ps"))).typ({perf:true})
"Women had eaten." → "woman" plural, past perfective verb.
var title=VP(V("go").t("b"),P("from"),Q(network[trip[0][0][0]].stationName), P("to"),Q(network[last(last(trip))[0]].stationName)).cap().tag("h2")+"\n";
class MyDatas(Datas)
def __init__(self, json_in)
super().__init__(json_in)
self.my_job = "developer"
class MyText(TextClass):
def __init__(self, section):
super().__init__(section)
self.text = (
"Hello",
self.nlg_syn("world", "everyone"),
".",
self.nlg_tags('br'),
self.nlg_tags('b', "Nice to meet you."),
"I am a",
self.my_job,
"."
)
my_datas = MyDatas(input) document = Document(my_datas) my_section = document.new_section(html_elem_attr={"id": "mySection"}) MyText(my_section) document.write() # <div id="mySection">Hello everyone.<br> <b>Nice to meet you.</b> I am a developer.</div>
data, plus html elements:
my_list = ["six apples", "three bananas", "two peaches"] self.nlg_enum(my_list) # "six apples, three bananas and two peaches" self.nlg_enum(my_list, last_sep="but also") # "six apples, three bananas but also two peaches" my_list = ['apples', 'bananas', 'peaches'] self.nlg_enum( my_list, max_elem=2, nb_elem_bullet=2, begin_w='Fruits I like :', end_w='Delicious, right ?', end_of_bullet=',', end_of_last_bullet='.' ) """ Fruits I like : - Apples, - Bananas. Delicious, right ? """
Multi-lingual, oriented more towards product descriptions/shorter text? using a template language called Pug.
- var data = ['apples', 'bananas', 'apricots', 'pears']; p eachz fruit in data with { separator: ',', last_separator: 'and', begin_with_general: 'I love', end:'!' } | #{fruit} `, 'I love apples, bananas, apricots and pears!' );
A "no code solution" using a GUI.
"Accelerated Text provides a web based Document Plan builder, where:
the logical structure of the document is defined
communication goals are expressed
data usage within text is defined"
Tool | Langs | Ling/Orth Handling | Good variant handling (cycle, weights..) | Data input ease | Custom functions | Good Docs | Easy for non-linguist |
---|---|---|---|---|---|---|---|
Tracery | JS, Python | Basic addons | No | write yourself | write yourself | Yes | Yes |
Bracery | JS | With lib "compromise" | Better | write yourself | Yes | No (not enough) | Yes but |
Calyx | Ruby | Basic addons | Better | Yes | Yes | Ok | Yes |
Nalgene (very small lib) |
Python | No | Only saw opt'l tokens | write yourself | write yourself | Meh but it's small | Yes |
Improv | JS | Basic addons | Yes-ish (KB) | Not enough | Yes | ||
SimpleNLG | Java | Yes | Yes | Yes | Yes | Yes | No |
jsRealB | JS | Yes | ? | Yes | write yourself | Yes | No |
CoreNLG | Python | No? | Yes | Yes | Yes | Not enough | Yes |
RosaeNLG | JS & Pug | Yes | Yes | Yes | Yes - in Pug | YES! | Yes |
Accelerated | GUI, csv input | Yes | ? | CSV | ? | ? ok | ? |
Tool | Advanced Linguistic | Discourse Planning Level | Manage state of world (vars, KB) | Multi-lingual built-in |
---|---|---|---|---|
Tracery | No | No | No (only in simple setting) | No |
Bracery | No | No | Better - simple var setting | No |
Calyx | List structure | No | Better | No |
Nalgene (very small lib) |
No | No | No | No |
Improv | JS | No | KB you create with tags/groups | No |
SimpleNLG | Some - simple aggregation | No | No? | Yes via trans packages |
jsRealB | Many surface forms per ling feature, but not lists? | No | ? | French |
CoreNLG | Lists | No | Yes | French |
RosaeNLG | Yes - Lists, referring expr. | At template combo level | Some var setting / tracking | Yes! |
Accelerated | Some | At template creation? | ? | ? |
NB: you can code almost anything yourself in/around most of these (except GUI tools) - question is whether it's a hacky tackon or part of tool design
For example, the sentence generated to describe 24 hour front desk check-in/check-out services has over 6,000 unique variants.
Used SimpleNLG, but built an ontology KB and levels of document planning (macro and micro):
Procedural generation is, effectively, a way of getting 200% of the content with 400% of the work.
- Bruno Dias, in Procedural Storytelling in Game Design
Medium post series on creation
Subcutanean is a unique novel for print-on-demand or ebook platforms that changes for each new reader. Telling a queer coming of age story about parallel realities and creepy impossible basements, the novel is written in a bespoke format for variant text.
Variables to determine content segments:
Manual review of all generated content for "acceptance"
"Macro" text insertion
STORY
Data
Structure
Dwarf Fortress
Umap layout of
race/gender/age/social links
Look at outliers,
understand the groups
childless male dwarves
chewed up in battle
Get the population
stats
Find outliers,
query for them
Describe those.
Nino Bulbcreed, an elf, was outrageously bad at wrestling. Nidela Fordobey, an elf, was breathtakingly inept at wrestling. Baros Growlmartyr The Fierce Evil, an iguana fiend, was breathtakingly great at wrestling. Kadol Bendoars, a goblin, was shockingly good at wrestling.
Stongun Bluntfocused constructed the artifact "The Forest Retreat In Practice" at Frostyprairie, a Forest Retreat, in 55.
It seems that the most musical dwarf is female dwarf Goden Mantheater, who can play 4 types of instruments (no one else can).
Abbeyenjoy is a very literary place!
Tutorial at ACL 2019, Storytelling from Structured Data and Knowledge Graphs (slides) (site)
Awesome Natural Language Generation curated list (mostly neural)
Survey of State of the Art in NLG (Ghatt & Krahmer 2018)
LiLiang's repo of some papers about data2text
Automating the News, Nicholas Diakopoulos
TableGPT: Few-shot Table-to-Text Generation with Table Structure Reconstruction and Content Matching (Gong et al 2020)
My table2text links from arXiv (updated regularly)