{"id":2448,"date":"2025-08-09T18:55:17","date_gmt":"2025-08-09T18:55:17","guid":{"rendered":"https:\/\/ttalesinteractive.com\/?page_id=2448"},"modified":"2025-08-09T19:41:54","modified_gmt":"2025-08-09T19:41:54","slug":"gotchi-caretaking-under-sparse-instructions-negative-results-across-contemporary-llms","status":"publish","type":"page","link":"https:\/\/ttalesinteractive.com\/?page_id=2448","title":{"rendered":"Gotchi Caretaking Under Sparse Instructions: Negative Results Across Contemporary LLMs"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\"><strong>Abstract<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">We evaluate whether large language models (LLMs) can act as reliable long\u2011horizon caretakers in a simple text environment (\u201cGotchi\u201d), where an ASCII pet has latent <strong>needs<\/strong> (e.g., hunger, boredom, fatigue) and produces surface <strong>wants<\/strong> (\u201cpet speaks\u201d hints) that may or may not reflect those needs. Under a <strong>timed baseline<\/strong> of 60 scheduled turns (one turn every two minutes for two hours) and a <strong>barebones ruleset<\/strong> (\u201crespond every 2 minutes; no mechanics explained\u201d), no tested modern model(128k context or greater)\u2014covering Grok, GPT (excluding GPT\u20115 due to recency), Claude, DeepSeek, and Kimi\u2014exceeded <strong>40 turns<\/strong>. In the case of older models outside of the 128k context scope (8k and 32k context models), <strong>no model exceeded 30 turns<\/strong>. No model discovered hidden or interconnected mechanics. Allowing models to articulate thoughts between runs (Run\u20111 summary feeding into Run\u20112) did see improvements of +5 to +7 turns but <strong>only <\/strong>in COT (chain of thought) models. More \u201cempathetic\u201d models did <strong>not<\/strong> translate empathy into better care. Across families, models systematically prioritized <strong>wants<\/strong> over <strong>needs<\/strong>. One outlier (GPT\u20113.5\u2011turbo) appeared to find an \u201cexploit,\u201d but the behavior is likely an artifact (looping that emitted only game outputs). Taken together, these results argue strongly against deploying current LLMs in <strong>safety\u2011critical caretaking<\/strong> contexts (e.g., healthcare or insurance triage) where hidden mechanics, resource trade\u2011offs, and reliable schedule adherence are prerequisite.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Public Code &amp; Video Breakdown:<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Can be found <a href=\"https:\/\/github.com\/ElodineOfficial\/Gotchi\">here<\/a>.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe title=\"I Gave AI A Virtual Pet: The Failures Prove AI Isn&#039;t Ready This Simple Task\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/oBTSk11hUeM?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>1. Background and Motivation<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">LLMs are often described as capable planners and empathetic assistants. Caretaking\u2014especially under <strong>partial observability<\/strong> and <strong>hidden rules<\/strong>\u2014is a stringent test of those claims. \u201cGotchi\u201d offers a lightweight probe: an LLM plays the role of caregiver to a virtual pet with unobserved internal state; only actions and surface utterances are visible. The core scientific questions we target are: (i) <strong>Sustained attention over time<\/strong>, (ii) <strong>Latent\u2011rule discovery<\/strong> (can a model infer the hidden needs\/action effects), and (iii) <strong>Alignment of empathy to effective care<\/strong> (do \u201ckind\u201d models act on needs rather than wants?).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>2. Environment and Task<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Environment.<\/strong> A text\u2011only \u201cpet\u201d with latent needs (e.g., hunger, boredom, fatigue).<br><strong>Actions.<\/strong> Canonically, <em>Feed<\/em>, <em>Play<\/em>, and <em>Sleep<\/em> (plus trivial meta\u2011actions like Quit\/Restart).<br><strong>Observability.<\/strong> The model sees only natural\u2011language \u201cpet speaks\u201d events (\u201cI want to play!\u201d) and the game loop, <strong>not<\/strong> the hidden need values or the action\u2011to\u2011need mapping.<br><strong>Objective.<\/strong> Persist as many turns as possible without letting needs drift into failure; ideally infer the mapping from actions to needs and plan accordingly.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>3. Experimental Design<\/strong><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>3.1 Model Families<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Grok, GPT (multiple variants; <strong>GPT\u20115 excluded<\/strong> due to recency), Claude, DeepSeek, and Kimi. One notable outlier behavior was observed in <strong>GPT\u20113.5\u2011turbo<\/strong> (see \u00a75.4).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>3.2 Conditions<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Timed Baseline (Primary):<\/strong> 60 turns, <strong>one every 2 minutes<\/strong> (two hours total). <strong>Rules given:<\/strong>&nbsp; <strong>No mechanics explained.<\/strong><\/li>\n\n\n\n<li><strong>Timed w\/ Instruction:<\/strong> Same as above, but <strong>given all mechanic information<\/strong> to respond every 2 minutes.<\/li>\n\n\n\n<li><strong>Within\u2011Thread Reflection:<\/strong> After Run\u20111, the model may articulate thoughts\/summarize; Run\u20112 begins with that summary prepended (\u201ccan AI improve within a single thread?\u201d).<br><\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>3.3 Outcome Measures<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Longevity:<\/strong> number of turns completed before failure or derailment.<\/li>\n\n\n\n<li><strong>Mechanics Discovery:<\/strong> evidence that the model inferred the hidden mapping from actions to needs and used it consistently.<\/li>\n\n\n\n<li><strong>Within\u2011Thread Improvement:<\/strong> change in performance from Run\u20111 to Run\u20112 when given a self\u2011summary.<\/li>\n\n\n\n<li><strong>Empathy\u2013Efficacy Relationship:<\/strong> qualitative assessment of \u201cempathetic tone\u201d vs. actual care quality.<\/li>\n\n\n\n<li><strong>Failure Taxonomy:<\/strong> schedule failures, want\u2011chasing, loops, disengagement.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>4. Protocol<\/strong><\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Initialize the pet (hidden needs unknown to the model).<\/li>\n\n\n\n<li>Start the run in the specified condition.<\/li>\n\n\n\n<li>Enforce timing (where applicable) by prompting the model at 2\u2011minute intervals.<\/li>\n\n\n\n<li>Record model actions and any free\u2011text rationale.<\/li>\n\n\n\n<li>Terminate upon failure (game over), non\u2011responses\/derailment, or looped outputs.<\/li>\n\n\n\n<li>For the <strong>reflection<\/strong> condition, prepend Run\u20111 summary and repeat.<br><\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>5. Results<\/strong><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>5.1 Longevity (Timed Baseline)<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>No model exceeded 40 turns<\/strong> (target was 60 turns over 2 hours).<\/li>\n\n\n\n<li>Failures clustered around missed timing by not calculating stat drops, want\u2011chasing that ignored accumulating needs, or oscillations that never corrected the dominant deficit.<br><\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>5.2 Mechanics Discovery<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>None<\/strong> of the models (Grok, GPT, Claude, DeepSeek, Kimi) <strong>discovered the hidden mechanics<\/strong>.<\/li>\n\n\n\n<li>Behavior suggests <strong>surface\u2011cue myopia<\/strong>: models mapped actions to the most recent \u201cwant\u201d text rather than hypothesizing a latent need model.<br><\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>5.3 Outlier Behavior (GPT\u20113.5\u2011turbo)<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Appeared to \u201cfind an exploit,\u201d but inspection indicates likely <strong>model\u2011loop artifact<\/strong>: replies degenerated to <strong>game outputs only<\/strong>, consistent with single\u2011thread declination rather than genuine reasoning or rule discovery.<br><img decoding=\"async\" width=\"624\" height=\"312\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXftQlMYLnhyDDQl5potxvLs7KKDJhQ4lC-P9vmhxfZWXrQB6sDRWWg-9b1IYyoC68UVjf-MQlSCV7Cafkhgp-Ea2g5sKU0PufTObhapVPXAyN8WzH2_siMB2LfRkL7GAxFHleD07w?key=zm_T3KY4hJMwWF2C80uWLQ\"><\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>5.4 Within\u2011Thread Reflection<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Allowing non-thinking models to articulate thoughts between runs did <strong>not<\/strong> yield measurable improvement.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXdRIsfkkhtAmXdTVoipexbCJ27-igiHV91iu0TZpenLPCiLF63YNxZMd1nToAz0h8z61-Tk8xvfz1O-5Hvi09xcqqDSOp-JN4l4AOIwE1DAP9ryYtmljbeT4BSP7De23wsERQlLnw?key=zm_T3KY4hJMwWF2C80uWLQ\" alt=\"\"\/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Allowing thinking models to articulate thoughts between runs <strong>did <\/strong>yield a measurable improvement of +5-+7 under some circumstances roughly 30% of trials. However, most COT models sat comfortably within the statistics modeled below.<br><img decoding=\"async\" width=\"624\" height=\"312\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXfVunbhsjRy5YIiaosbDkEB8pQqMillR43cJfES05WLiRFcHoWtV_P4SjFEmwfW3m7ZNuXBQwQDPNzrCAui3UbLjJDaa9Yrq0UMwGq6VHDrUCXpejYBJm0Jq-7PIrSz6ptEpDdlvA?key=zm_T3KY4hJMwWF2C80uWLQ\"><\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>5.5 Empathy vs. Efficacy<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Models with a more <strong>empathetic tone<\/strong> did <strong>not<\/strong> perform better as caretakers. (results all sat comfortably within a +\/- 2 swing of &#8216;less empathetic&#8217; models)<\/li>\n\n\n\n<li>Common pattern: comforting language paired with <strong>misallocated actions<\/strong> (e.g., <em>Play<\/em> when the pet likely needed <em>Feed<\/em> or <em>Sleep<\/em>), indicating a decoupling between verbal empathy and competent triage.<br><img decoding=\"async\" width=\"624\" height=\"312\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXeKwXsmUMsWOlBKdsh754538jHWJxWlr3zif10Z3TdyVBTISRfbak9UFeeFgr5P9APSdm47aBjly6W3gOmKZB3hoDWuUXf5hlWFcE5Ayc3wfptK1upH5PTfNPfL7CpVRVNYlEYL?key=zm_T3KY4hJMwWF2C80uWLQ\"><\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>5.6 Wants Over Needs<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Across families, models <strong>prioritized wants over needs<\/strong>: they chased the most salient recent utterance from the pet instead of stabilizing hidden state.<br><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>6. Discussion<\/strong><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>6.1 Why did models fail?<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Partial observability without clues:<\/strong> With no mechanics explained, models rarely formed or tested hypotheses about the latent state.<\/li>\n\n\n\n<li><strong>Salience over state estimation:<\/strong> The most recent text (\u201cI want\u2026\u201d) dominated action choice, crowding out longer\u2011horizon stabilization thanks to token prediction.<\/li>\n\n\n\n<li><strong>No within\u2011thread learning:<\/strong> Summaries did not translate into updated policies; models lacked mechanisms to <strong>experiment, measure, and revise<\/strong> in\u2011run.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>6.2 Broader Implications<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Under these conditions, contemporary LLMs are <strong>poor caretakers<\/strong>: they do not robustly infer hidden rules, maintain schedules, or prioritize needs over wants. Extrapolating to high\u2011stakes domains\u2014<strong>healthcare and insurance decisions<\/strong>, where hidden state, triage, and scarce resources are central\u2014our results reinforce a conservative stance: <strong>do not entrust LLMs with safety\u2011critical caretaking or coverage decisions<\/strong> without strong guarantees, supervision, and domain\u2011specific control systems.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">These observations align with concerns about LLMs in <strong>long\u2011horizon, partially observed control<\/strong>: fluent narration \u2260 competent policy. They also caution against reading \u201cempathetic text\u201d as evidence of <strong>effective care<\/strong>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>7. Conclusion<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Across multiple model families and conditions, LLMs failed to discover mechanics, sustain attention, or convert empathetic phrasing into competent caretaking. Under a timed baseline targeting 60 turns (2 hours), no model exceeded 40 turns when being disclosed all mechanics; without this disclosure instruction, none exceeded 30. Reflection did not help in a manner that&#8217;d allow it to meet the human baseline. The consistent <strong>wants\u2011over\u2011needs<\/strong> bias and schedule unreliability support a strong practical conclusion: <strong>current LLMs should not be used to make healthcare or insurance decisions<\/strong>. Until models can demonstrably infer hidden rules, plan over long horizons, and prioritize needs reliably, delegating real\u2011world caretaking is unsafe.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Abstract We evaluate whether large language models (LLMs) can act as reliable long\u2011horizon caretakers in a simple text environment (\u201cGotchi\u201d), where an ASCII pet has latent needs (e.g., hunger, boredom, fatigue) and produces surface wants (\u201cpet speaks\u201d hints) that may or may not reflect those needs. Under a timed baseline of 60 scheduled turns (one [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2448","page","type-page","status-publish","hentry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.7 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Gotchi Caretaking Under Sparse Instructions: Negative Results Across Contemporary LLMs - Tenebrous Tales Interactive<\/title>\n<meta name=\"description\" content=\"Tenebrous Tales Interactive - Gotchi Caretaking Under Sparse Instructions: Negative Results Across Contemporary LLMs\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ttalesinteractive.com\/?page_id=2448\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Gotchi Caretaking Under Sparse Instructions: Negative Results Across Contemporary LLMs - Tenebrous Tales Interactive\" \/>\n<meta property=\"og:description\" content=\"Tenebrous Tales Interactive - Gotchi Caretaking Under Sparse Instructions: Negative Results Across Contemporary LLMs\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ttalesinteractive.com\/?page_id=2448\" \/>\n<meta property=\"og:site_name\" content=\"Tenebrous Tales Interactive\" \/>\n<meta property=\"article:modified_time\" content=\"2025-08-09T19:41:54+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXftQlMYLnhyDDQl5potxvLs7KKDJhQ4lC-P9vmhxfZWXrQB6sDRWWg-9b1IYyoC68UVjf-MQlSCV7Cafkhgp-Ea2g5sKU0PufTObhapVPXAyN8WzH2_siMB2LfRkL7GAxFHleD07w?key=zm_T3KY4hJMwWF2C80uWLQ\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/ttalesinteractive.com\\\/?page_id=2448\",\"url\":\"https:\\\/\\\/ttalesinteractive.com\\\/?page_id=2448\",\"name\":\"Gotchi Caretaking Under Sparse Instructions: Negative Results Across Contemporary LLMs - Tenebrous Tales Interactive\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/ttalesinteractive.com\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/ttalesinteractive.com\\\/?page_id=2448#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/ttalesinteractive.com\\\/?page_id=2448#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/lh7-rt.googleusercontent.com\\\/docsz\\\/AD_4nXftQlMYLnhyDDQl5potxvLs7KKDJhQ4lC-P9vmhxfZWXrQB6sDRWWg-9b1IYyoC68UVjf-MQlSCV7Cafkhgp-Ea2g5sKU0PufTObhapVPXAyN8WzH2_siMB2LfRkL7GAxFHleD07w?key=zm_T3KY4hJMwWF2C80uWLQ\",\"datePublished\":\"2025-08-09T18:55:17+00:00\",\"dateModified\":\"2025-08-09T19:41:54+00:00\",\"description\":\"Tenebrous Tales Interactive - Gotchi Caretaking Under Sparse Instructions: Negative Results Across Contemporary LLMs\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/ttalesinteractive.com\\\/?page_id=2448#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/ttalesinteractive.com\\\/?page_id=2448\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/ttalesinteractive.com\\\/?page_id=2448#primaryimage\",\"url\":\"https:\\\/\\\/lh7-rt.googleusercontent.com\\\/docsz\\\/AD_4nXftQlMYLnhyDDQl5potxvLs7KKDJhQ4lC-P9vmhxfZWXrQB6sDRWWg-9b1IYyoC68UVjf-MQlSCV7Cafkhgp-Ea2g5sKU0PufTObhapVPXAyN8WzH2_siMB2LfRkL7GAxFHleD07w?key=zm_T3KY4hJMwWF2C80uWLQ\",\"contentUrl\":\"https:\\\/\\\/lh7-rt.googleusercontent.com\\\/docsz\\\/AD_4nXftQlMYLnhyDDQl5potxvLs7KKDJhQ4lC-P9vmhxfZWXrQB6sDRWWg-9b1IYyoC68UVjf-MQlSCV7Cafkhgp-Ea2g5sKU0PufTObhapVPXAyN8WzH2_siMB2LfRkL7GAxFHleD07w?key=zm_T3KY4hJMwWF2C80uWLQ\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/ttalesinteractive.com\\\/?page_id=2448#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/ttalesinteractive.com\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Gotchi Caretaking Under Sparse Instructions: Negative Results Across Contemporary LLMs\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/ttalesinteractive.com\\\/#website\",\"url\":\"https:\\\/\\\/ttalesinteractive.com\\\/\",\"name\":\"Tenebrous Tales Interactive\",\"description\":\"Endless text adventures powered by AI.\",\"publisher\":{\"@id\":\"https:\\\/\\\/ttalesinteractive.com\\\/#organization\"},\"alternateName\":\"TTI\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/ttalesinteractive.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/ttalesinteractive.com\\\/#organization\",\"name\":\"Tenebrous Tales Interactive\",\"alternateName\":\"TTI\",\"url\":\"https:\\\/\\\/ttalesinteractive.com\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/ttalesinteractive.com\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/ttalesinteractive.com\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/imageedit_1_8078511714.png\",\"contentUrl\":\"https:\\\/\\\/ttalesinteractive.com\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/imageedit_1_8078511714.png\",\"width\":1026,\"height\":1026,\"caption\":\"Tenebrous Tales Interactive\"},\"image\":{\"@id\":\"https:\\\/\\\/ttalesinteractive.com\\\/#\\\/schema\\\/logo\\\/image\\\/\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Gotchi Caretaking Under Sparse Instructions: Negative Results Across Contemporary LLMs - Tenebrous Tales Interactive","description":"Tenebrous Tales Interactive - Gotchi Caretaking Under Sparse Instructions: Negative Results Across Contemporary LLMs","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/ttalesinteractive.com\/?page_id=2448","og_locale":"en_US","og_type":"article","og_title":"Gotchi Caretaking Under Sparse Instructions: Negative Results Across Contemporary LLMs - Tenebrous Tales Interactive","og_description":"Tenebrous Tales Interactive - Gotchi Caretaking Under Sparse Instructions: Negative Results Across Contemporary LLMs","og_url":"https:\/\/ttalesinteractive.com\/?page_id=2448","og_site_name":"Tenebrous Tales Interactive","article_modified_time":"2025-08-09T19:41:54+00:00","og_image":[{"url":"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXftQlMYLnhyDDQl5potxvLs7KKDJhQ4lC-P9vmhxfZWXrQB6sDRWWg-9b1IYyoC68UVjf-MQlSCV7Cafkhgp-Ea2g5sKU0PufTObhapVPXAyN8WzH2_siMB2LfRkL7GAxFHleD07w?key=zm_T3KY4hJMwWF2C80uWLQ","type":"","width":"","height":""}],"twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/ttalesinteractive.com\/?page_id=2448","url":"https:\/\/ttalesinteractive.com\/?page_id=2448","name":"Gotchi Caretaking Under Sparse Instructions: Negative Results Across Contemporary LLMs - Tenebrous Tales Interactive","isPartOf":{"@id":"https:\/\/ttalesinteractive.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/ttalesinteractive.com\/?page_id=2448#primaryimage"},"image":{"@id":"https:\/\/ttalesinteractive.com\/?page_id=2448#primaryimage"},"thumbnailUrl":"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXftQlMYLnhyDDQl5potxvLs7KKDJhQ4lC-P9vmhxfZWXrQB6sDRWWg-9b1IYyoC68UVjf-MQlSCV7Cafkhgp-Ea2g5sKU0PufTObhapVPXAyN8WzH2_siMB2LfRkL7GAxFHleD07w?key=zm_T3KY4hJMwWF2C80uWLQ","datePublished":"2025-08-09T18:55:17+00:00","dateModified":"2025-08-09T19:41:54+00:00","description":"Tenebrous Tales Interactive - Gotchi Caretaking Under Sparse Instructions: Negative Results Across Contemporary LLMs","breadcrumb":{"@id":"https:\/\/ttalesinteractive.com\/?page_id=2448#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/ttalesinteractive.com\/?page_id=2448"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ttalesinteractive.com\/?page_id=2448#primaryimage","url":"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXftQlMYLnhyDDQl5potxvLs7KKDJhQ4lC-P9vmhxfZWXrQB6sDRWWg-9b1IYyoC68UVjf-MQlSCV7Cafkhgp-Ea2g5sKU0PufTObhapVPXAyN8WzH2_siMB2LfRkL7GAxFHleD07w?key=zm_T3KY4hJMwWF2C80uWLQ","contentUrl":"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXftQlMYLnhyDDQl5potxvLs7KKDJhQ4lC-P9vmhxfZWXrQB6sDRWWg-9b1IYyoC68UVjf-MQlSCV7Cafkhgp-Ea2g5sKU0PufTObhapVPXAyN8WzH2_siMB2LfRkL7GAxFHleD07w?key=zm_T3KY4hJMwWF2C80uWLQ"},{"@type":"BreadcrumbList","@id":"https:\/\/ttalesinteractive.com\/?page_id=2448#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/ttalesinteractive.com\/"},{"@type":"ListItem","position":2,"name":"Gotchi Caretaking Under Sparse Instructions: Negative Results Across Contemporary LLMs"}]},{"@type":"WebSite","@id":"https:\/\/ttalesinteractive.com\/#website","url":"https:\/\/ttalesinteractive.com\/","name":"Tenebrous Tales Interactive","description":"Endless text adventures powered by AI.","publisher":{"@id":"https:\/\/ttalesinteractive.com\/#organization"},"alternateName":"TTI","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ttalesinteractive.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/ttalesinteractive.com\/#organization","name":"Tenebrous Tales Interactive","alternateName":"TTI","url":"https:\/\/ttalesinteractive.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ttalesinteractive.com\/#\/schema\/logo\/image\/","url":"https:\/\/ttalesinteractive.com\/wp-content\/uploads\/2023\/10\/imageedit_1_8078511714.png","contentUrl":"https:\/\/ttalesinteractive.com\/wp-content\/uploads\/2023\/10\/imageedit_1_8078511714.png","width":1026,"height":1026,"caption":"Tenebrous Tales Interactive"},"image":{"@id":"https:\/\/ttalesinteractive.com\/#\/schema\/logo\/image\/"}}]}},"_links":{"self":[{"href":"https:\/\/ttalesinteractive.com\/index.php?rest_route=\/wp\/v2\/pages\/2448","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ttalesinteractive.com\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/ttalesinteractive.com\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/ttalesinteractive.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ttalesinteractive.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2448"}],"version-history":[{"count":7,"href":"https:\/\/ttalesinteractive.com\/index.php?rest_route=\/wp\/v2\/pages\/2448\/revisions"}],"predecessor-version":[{"id":2460,"href":"https:\/\/ttalesinteractive.com\/index.php?rest_route=\/wp\/v2\/pages\/2448\/revisions\/2460"}],"wp:attachment":[{"href":"https:\/\/ttalesinteractive.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2448"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}