Brainwind is a markup convention — not a library — that gives LLM responses a machine-readable semantic layer without separating the structured data from the human-readable content. The document is the data.
Ask an LLM about an institution. It returns readable prose and machine-parseable
entity nodes — in the same HTML document. The screenshots below show the raw annotated output
(left) and the four entities extracted from it with a single querySelectorAll (right).
Ask an LLM about an institutional investor and you get prose. Useful for reading, useless for processing. Ask it for JSON and you lose the natural language context entirely. There is no standard middle ground — a response that is both readable and parseable in the same document.
This is not a new problem. The semantic web community spent a decade on RDFa, Microdata, and Microformats — all technically sound, all largely ignored in practice because they required developers to manually annotate existing content. The adoption cost was too high for the benefit, which was mostly invisible to end users.
LLMs change the distribution model. Instead of asking developers to annotate content, you put the convention in a system prompt and the LLM produces it natively. The annotation cost drops to zero — or rather, it shifts from humans to inference time.
Whether that tradeoff holds in practice — consistent output, valid taxonomy values, reliable slugs — is what this project is exploring.
The convention is established once in the system prompt and holds for the conversation. It defines three things: the markup pattern, the property-bag syntax, and the taxonomy of valid values.
The taxonomy is written in the same Category:Value/SubValue
path notation as data-bw-props itself. The LLM learns
one syntax, not two.
# When your response contains named entities, annotate them using Brainwind. # Wrap your entire response in a .bw-context element. ## Node <div class="bw-node" data-bw-type="[Schema.org type]" data-bw-id="[unique-slug]" data-bw-props="[property bag]"> <span data-bw-label>[Name]</span> <span data-bw-role>[Role in context]</span> </div> ## Property bag syntax # Entries separated by | Key/value by : Hierarchy by / data-bw-props="AssetClass:Equities/All cap/Core|Region:Global|ManagementApproach:Passive" ## Relationship edge <span class="bw-edge" data-bw-rel="[verb]" data-bw-from="[source-id]" data-bw-to="[target-id]">human-readable phrase</span> ## Taxonomy (excerpt — use only these values) # AssetClass Equities/All cap/Core · Equities/Large cap/Value · Bonds/Government bonds Real Estate/Direct Real Estate/Office · Alternatives/Infrastructure/unlisted infrastructure # Region Global · US · Asia Pacific/Japan · Europe ex UK/Nordics · Emerging Markets # ManagementApproach Active · Passive · Enhanced Indexing · Other # MandateStatus Tender · Potential · Under Review · Completed · Termination · To Expire
The full asset class taxonomy runs to ~300 lines. Pasting it
into every system prompt is expensive in tokens and still doesn't
prevent the LLM from inventing plausible-sounding but invalid paths
like Equities/Large Growth instead of
Equities/Large cap/Growth. Taxonomy drift is real.
A validation pass on data-bw-props values against
known paths is worth building into your parser — treat unknown
values as warnings, not errors, so output stays usable.
The user writes natural language. No special syntax on their end. The semantic structure comes out of the model as a consequence of the system prompt — not of the question.
This is worth noting because it means Brainwind doesn't require a special UI or a structured input form. Any chat interface or API call works. The semantic layer is generated as a side effect of answering the question, not as a separate task.
The LLM produces HTML. A person reads the prose. A parser reads
the bw-node elements. Neither representation is
secondary — they coexist in the same file.
<div class="bw-context" data-bw-vocab="https://schema.org/"> <p>The CTPF is a major institutional investor managing retirement assets for public school teachers in Chicago...</p> <div class="bw-node" data-bw-type="Organization" data-bw-id="ctpf" data-bw-props="Region:US"> <span data-bw-label>Chicago Public School Teachers' Pension and Retirement Fund (CTPF)</span> <span data-bw-role>Pension Fund / Institutional Investor</span> </div> <div class="bw-node" data-bw-type="Person" data-bw-id="carlton-lenoir"> <span data-bw-label>Carlton W. Lenoir, Sr.</span> <span data-bw-role>Executive Director</span> </div> <span class="bw-edge" data-bw-rel="worksFor" data-bw-from="carlton-lenoir" data-bw-to="ctpf">serves as Executive Director for</span> </div>
Because the output is HTML, it sits naturally inside the existing
web toolchain. Tailwind styles the pixels. Brainwind styles
the semantics. HTMX can make nodes interactive — a
hx-get on a bw-node fetches additional
data from an API on click, with the node's data-bw-id
as the lookup key. None of this requires changes to the Brainwind
convention; it is a consequence of staying inside HTML.
Tailwind and HTMX work because they target HTML — not because
they have any awareness of Brainwind. Tailwind styles whatever
classes are present. HTMX responds to whatever hx-*
attributes you add. You could combine them, but that means adding
Tailwind classes or HTMX attributes to the LLM output, which
complicates the system prompt and increases the chance of
malformed markup. It works, but it isn't free.
Because Brainwind uses standard HTML attributes, extraction needs
no library. querySelectorAll is the query engine.
// Every bw-node in the document, as plain objects const nodes = [...document.querySelectorAll('.bw-node')].map(el => ({ id: el.dataset.bwId, type: el.dataset.bwType, label: el.querySelector('[data-bw-label]')?.textContent.trim(), role: el.querySelector('[data-bw-role]')?.textContent.trim(), props: el.dataset.bwProps // still a raw string — parse separately }));
Getting nodes out of the DOM is trivial. The hard questions come
after: How do you validate data-bw-props paths against
the taxonomy? How do you deduplicate entities across conversations
(the same organisation appears with slightly different labels)?
How do you handle the LLM generating a data-bw-id of
ctpf-fund in one turn and ctpf in the
next? None of these are solved by the convention — they are the
work left to build on top of it.
Brainwind is not a new ontology. The attribute values are designed
to resolve against established standards. data-bw-type="Organization"
is a Schema.org class. data-bw-vocab declares the
namespace the same way JSON-LD's @context does.
data-bw-rel="worksFor" is a Schema.org property.
This means a Brainwind document is already an implicit RDF graph.
Each bw-node is a subject. data-bw-type
is rdf:type. data-bw-props entries are
predicate–object pairs. bw-edge elements are explicit
triples. A serialiser can emit valid JSON-LD or Turtle from any
Brainwind page without loss.
| Brainwind attribute | Maps to |
|---|---|
| data-bw-vocab | JSON-LD @context / RDF namespace declaration |
| data-bw-type | Schema.org class · rdf:type |
| data-bw-id | JSON-LD @id · local slug, extensible to URI |
| data-bw-props | Multiple predicate–object pairs (property bag) |
| data-bw-rel (on bw-edge) | Schema.org property · rdf:predicate |
| data-bw-label | rdfs:label · Schema.org name |
Brainwind is not a substitute for a proper knowledge graph if you need inference, SPARQL queries, or OWL reasoning. It is a lightweight annotation layer — closer to Microformats in ambition than to a full RDF store. The value is in the generation model and the zero-dependency extraction, not in the expressiveness of the data model.
The data-bw-props property bag is deliberately flat —
a serialised set of facets, not a graph of triples. This makes it
easy to produce and easy to parse, at the cost of not being able
to express nested or qualified statements. That tradeoff is
intentional for the LLM output use case.
The longer-term path is for data-bw-id slugs to become
full URIs — https://schema.org/Organization rather than
just Organization, or a Wikidata QID rather than a
local slug. That would make Brainwind documents directly linkable
into the wider linked data web. For now, local slugs are a
practical starting point that keeps the system prompt short.