Brainwind — Semantic markup for LLM responses

01

The problem this tries to solve

LLM responses are
structurally opaque

Ask an LLM about an institutional investor and you get prose. Useful for reading, useless for processing. Ask it for JSON and you lose the natural language context entirely. There is no standard middle ground — a response that is both readable and parseable in the same document.

This is not a new problem. The semantic web community spent a decade on RDFa, Microdata, and Microformats — all technically sound, all largely ignored in practice because they required developers to manually annotate existing content. The adoption cost was too high for the benefit, which was mostly invisible to end users.

Why this time might be different

LLMs change the distribution model. Instead of asking developers to annotate content, you put the convention in a system prompt and the LLM produces it natively. The annotation cost drops to zero — or rather, it shifts from humans to inference time.

Whether that tradeoff holds in practice — consistent output, valid taxonomy values, reliable slugs — is what this project is exploring.

02

Setting up the convention

One system prompt,
persistent contract

The convention is established once in the system prompt and holds for the conversation. It defines three things: the markup pattern, the property-bag syntax, and the taxonomy of valid values.

The taxonomy is written in the same Category:Value/SubValue path notation as data-bw-props itself. The LLM learns one syntax, not two.

System prompt — core convention

# When your response contains named entities, annotate them using Brainwind.
# Wrap your entire response in a .bw-context element.

## Node
<div class="bw-node"
     data-bw-type="[Schema.org type]"
     data-bw-id="[unique-slug]"
     data-bw-props="[property bag]">
  <span data-bw-label>[Name]</span>
  <span data-bw-role>[Role in context]</span>
</div>

## Property bag syntax
# Entries separated by |  Key/value by :  Hierarchy by /
data-bw-props="AssetClass:Equities/All cap/Core|Region:Global|ManagementApproach:Passive"

## Relationship edge
<span class="bw-edge"
      data-bw-rel="[verb]"
      data-bw-from="[source-id]"
      data-bw-to="[target-id]">human-readable phrase</span>

## Taxonomy (excerpt — use only these values)
# AssetClass
Equities/All cap/Core · Equities/Large cap/Value · Bonds/Government bonds
Real Estate/Direct Real Estate/Office · Alternatives/Infrastructure/unlisted infrastructure
# Region
Global · US · Asia Pacific/Japan · Europe ex UK/Nordics · Emerging Markets
# ManagementApproach
Active · Passive · Enhanced Indexing · Other
# MandateStatus
Tender · Potential · Under Review · Completed · Termination · To Expire

Known limitation

The full asset class taxonomy runs to ~300 lines. Pasting it into every system prompt is expensive in tokens and still doesn't prevent the LLM from inventing plausible-sounding but invalid paths like Equities/Large Growth instead of Equities/Large cap/Growth. Taxonomy drift is real. A validation pass on data-bw-props values against known paths is worth building into your parser — treat unknown values as warnings, not errors, so output stays usable.

03

The user just asks

A plain question,
a structured answer

The user writes natural language. No special syntax on their end. The semantic structure comes out of the model as a consequence of the system prompt — not of the question.

User prompt

Please provide key people / contact details for people that work at institutional investor: Chicago Public School Teachers' Pensions and Retirement Fund.

This is worth noting because it means Brainwind doesn't require a special UI or a structured input form. Any chat interface or API call works. The semantic layer is generated as a side effect of answering the question, not as a separate task.

04

The response is both things at once

One document,
two readings

The LLM produces HTML. A person reads the prose. A parser reads the bw-node elements. Neither representation is secondary — they coexist in the same file.

LLM output — excerpt

<div class="bw-context" data-bw-vocab="https://schema.org/">

  <p>The CTPF is a major institutional investor managing retirement
  assets for public school teachers in Chicago...</p>

  <div class="bw-node"
       data-bw-type="Organization"
       data-bw-id="ctpf"
       data-bw-props="Region:US">
    <span data-bw-label>Chicago Public School Teachers' Pension
    and Retirement Fund (CTPF)</span>
    <span data-bw-role>Pension Fund / Institutional Investor</span>
  </div>

  <div class="bw-node"
       data-bw-type="Person"
       data-bw-id="carlton-lenoir">
    <span data-bw-label>Carlton W. Lenoir, Sr.</span>
    <span data-bw-role>Executive Director</span>
  </div>

  <span class="bw-edge"
        data-bw-rel="worksFor"
        data-bw-from="carlton-lenoir"
        data-bw-to="ctpf">serves as Executive Director for</span>

</div>

Split view: Brainwind markup on left, rendered preview on right — Split view of the actual output: raw markup (left) and its rendered form (right). The prose and the structured data are the same document.

Because the output is HTML, it sits naturally inside the existing web toolchain. Tailwind styles the pixels. Brainwind styles the semantics. HTMX can make nodes interactive — a hx-get on a bw-node fetches additional data from an API on click, with the node's data-bw-id as the lookup key. None of this requires changes to the Brainwind convention; it is a consequence of staying inside HTML.

Tailwind CSS

HTMX

Any HTML parser

Browser DevTools

CSS attribute selectors

What "compatible" actually means here

Tailwind and HTMX work because they target HTML — not because they have any awareness of Brainwind. Tailwind styles whatever classes are present. HTMX responds to whatever hx-* attributes you add. You could combine them, but that means adding Tailwind classes or HTMX attributes to the LLM output, which complicates the system prompt and increases the chance of malformed markup. It works, but it isn't free.

05

Getting the data out

Extraction is
just a selector

Because Brainwind uses standard HTML attributes, extraction needs no library. querySelectorAll is the query engine.

Extract all nodes — browser or Node.js

// Every bw-node in the document, as plain objects
const nodes = [...document.querySelectorAll('.bw-node')].map(el => ({
  id:    el.dataset.bwId,
  type:  el.dataset.bwType,
  label: el.querySelector('[data-bw-label]')?.textContent.trim(),
  role:  el.querySelector('[data-bw-role]')?.textContent.trim(),
  props: el.dataset.bwProps   // still a raw string — parse separately
}));

Four extracted entity cards: CTPF, Carlton W. Lenoir Sr., Fernando Vinzons, Chicago Illinois — Four entities extracted from the CTPF response: one organisation, two people, one place. Each has id, type, role, and a raw props string.

The extraction is the easy part

Getting nodes out of the DOM is trivial. The hard questions come after: How do you validate data-bw-props paths against the taxonomy? How do you deduplicate entities across conversations (the same organisation appears with slightly different labels)? How do you handle the LLM generating a data-bw-id of ctpf-fund in one turn and ctpf in the next? None of these are solved by the convention — they are the work left to build on top of it.

06

Where it sits in the standards landscape

Grounded in
existing vocabularies

Brainwind is not a new ontology. The attribute values are designed to resolve against established standards. data-bw-type="Organization" is a Schema.org class. data-bw-vocab declares the namespace the same way JSON-LD's @context does. data-bw-rel="worksFor" is a Schema.org property.

This means a Brainwind document is already an implicit RDF graph. Each bw-node is a subject. data-bw-type is rdf:type. data-bw-props entries are predicate–object pairs. bw-edge elements are explicit triples. A serialiser can emit valid JSON-LD or Turtle from any Brainwind page without loss.

Brainwind attribute	Maps to
data-bw-vocab	JSON-LD `@context` / RDF namespace declaration
data-bw-type	Schema.org class · `rdf:type`
data-bw-id	JSON-LD `@id` · local slug, extensible to URI
data-bw-props	Multiple predicate–object pairs (property bag)
data-bw-rel (on bw-edge)	Schema.org property · `rdf:predicate`
data-bw-label	`rdfs:label` · Schema.org `name`

What this doesn't replace

Brainwind is not a substitute for a proper knowledge graph if you need inference, SPARQL queries, or OWL reasoning. It is a lightweight annotation layer — closer to Microformats in ambition than to a full RDF store. The value is in the generation model and the zero-dependency extraction, not in the expressiveness of the data model.

The data-bw-props property bag is deliberately flat — a serialised set of facets, not a graph of triples. This makes it easy to produce and easy to parse, at the cost of not being able to express nested or qualified statements. That tradeoff is intentional for the LLM output use case.

The longer-term path is for data-bw-id slugs to become full URIs — https://schema.org/Organization rather than just Organization, or a Wikidata QID rather than a local slug. That would make Brainwind documents directly linkable into the wider linked data web. For now, local slugs are a practical starting point that keeps the system prompt short.

Semantic HTMLas LLM output.No framework required.

LLM responses arestructurally opaque

One system prompt,persistent contract

A plain question,a structured answer

One document,two readings

Extraction isjust a selector

Grounded inexisting vocabularies