Parsers & writers
The parse layer turns text into TreeBuffers (plus an
AnnotationTable); the write layer serializes back
out. Number formatting follows Java’s Double.toString / Integer.toString
semantics so FigTree and BEAST files round-trip byte-for-byte.
Format detection
Section titled “Format detection”import { detectFormat } from "insomni-phylo";
detectFormat("#NEXUS\n..."); // "nexus"detectFormat("((A,B),C);"); // "newick"detectFormat is a cheap leading-token sniff: a #NEXUS header → "nexus",
otherwise "newick". TreeFormat is "newick" | "nexus".
Newick & NHX
Section titled “Newick & NHX”parseNewick is a single-pass scanner that pre-sizes its buffers from the
source length, so even 500k-leaf trees never reallocate mid-parse.
import { AnnotationTable, NewickParseError, parseNewick } from "insomni-phylo";
const annotations = new AnnotationTable();const diagnostics = [];try { const tree = parseNewick("((A:0.1,B:0.2):0.3,C:0.4);", { annotations, // [&k=v] and [&&NHX:...] blocks are extracted into this table diagnostics, // recoverable parse warnings collected here });} catch (err) { if (err instanceof NewickParseError) console.error(err.message, err.pos);}ParseNewickOptions has two optional fields: annotations (an
AnnotationTable to populate) and diagnostics (a ParseDiagnostic[] sink).
NewickParseError carries a pos source offset.
Annotation bodies are parsed by parseAnnotationBody(body, table, nodeIdx),
which is exported for direct use and handles both conventions:
- FigTree / BEAST:
&key=value,key=value,…(comma-separated) - NHX:
&&NHX:key=value:key=value:…(colon-separated)
Both support quoted strings, barewords, numbers, booleans, and {a,b,c} nested
lists. Plain bracketed comments (no leading &) are ignored.
parseNexus returns a NexusDocument — it parses every TREES block, the TAXA
labels, the FIGTREE settings block, and the TRANSLATE map, and preserves the
verbatim text of blocks it does not model (DATA, CHARACTERS, custom BEAST
blocks) so round-trips never silently drop payloads.
import { NexusParseError, parseNexus } from "insomni-phylo";
const doc = parseNexus(nexusText);for (const t of doc.trees) { console.log(t.name, t.isDefault, t.rooted, t.tree.tipCount);}doc.figtreeSettings; // Map<string, string> of `set key=value;` entries, rawNexusDocument fields:
| Field | Meaning |
|---|---|
taxa: string[] | TAXA labels, in declared order. |
trees: NexusTree[] | Each carries name, rooted ([&R]/[&U]/null), isDefault, tree, annotations. |
figtreeSettings: Map<string,string> | FIGTREE-block set entries, verbatim. |
translate: Map<string,string> | TRANSLATE map (numeric id → label). |
preservedBlocks?: string[] | Verbatim source of unmodeled blocks. |
seenBlocks: string[] | Top-level block names in order (derailment detection). |
warnings: string[] / diagnostics: ParseDiagnostic[] | Recoverable issues. |
Writers
Section titled “Writers”import { writeNewick, writeNexus } from "insomni-phylo";
const newick = writeNewick(tree, { annotations });const nexus = writeNexus(doc, { useTranslate: true });writeNewick(tree, options?) walks iteratively (stack-safe on huge trees),
single-quotes names with special characters, and never emits a root branch
length. WriteNewickOptions:
annotations— emit[&k=v,…]after each node’s label for columns that have a value on that node.nameMap— rewrite stored names to emitted names (the NEXUS writer uses this to swap tip labels for numeric TRANSLATE keys).wrapWidth— insert a newline after a comma past this many chars (helps legacy apps that freeze on one giant line).precision— deprecated and ignored; lengths use JavaDouble.toStringsemantics for byte-stable round-trips.
writeNexus(doc, options?) emits only the blocks whose backing maps are
non-empty, and re-emits preserved blocks. WriteNexusOptions has useTranslate
(default true) and wrapWidth.
Three formatting primitives back the round-trip and are exported for direct use:
| Function | Purpose |
|---|---|
formatJavaDouble(n) | Double.toString-compatible (5.0, 1.0E-7, NaN). |
formatJavaInteger(n) | Integer.toString, 32-bit range checked. |
encodeFigtreeColor(r, g, b, a?) | #rrggbb / #aarrggbb color token. |
Lossless editing via phylo-cst
Section titled “Lossless editing via phylo-cst”parseNewick / parseNexus produce a semantic model: great for layout and
rendering, but they normalize whitespace, comment placement, and number
formatting. When the goal is to edit a file in place and write back exactly
what was there minus your edit, use @phylon/parser
instead — a concrete syntax tree that retains every token, comment, and byte of
formatting for surgical, lossless round-trips.
Rule of thumb: parse with insomni-phylo to understand a tree, parse with
@phylon/parser to rewrite one without disturbing the rest of the file.