Corrected Architecture: tree_haver + Citrus Backend
Key Correction
tree_haver remains completely grammar-agnostic!
tree_haver does NOT know about TOML, JSON, YAML, etc.
It only provides:
- Unified Node interface
- Backend abstraction (MRI/Rust/FFI/Java/Citrus)
- Generic grammar loading mechanism
Correct Architecture
┌─────────────────────────────────────────────────────────────────┐
│ toml-merge (TOML-SPECIFIC) │
│ │
│ • Depends on toml-rb (TOML Citrus grammar) │
│ • Depends on tree_haver (unified interface) │
│ • Provides TOML semantics (table?, key_name, etc.) │
│ • Works with ANY tree_haver backend │
└─────────────────────────────────────────────────────────────────┘
↓
Uses tree_haver's API
↓
┌─────────────────────────────────────────────────────────────────┐
│ tree_haver (GRAMMAR-AGNOSTIC) │
│ │
│ • Unified Node interface (type, text, children, etc.) │
│ • Backend selection (auto, mri, rust, ffi, java, citrus) │
│ • Grammar loading abstraction │
│ • NO knowledge of specific languages │
└─────────────────────────────────────────────────────────────────┘
↓
Delegates to selected backend
↓
┌──────────────────────────────────────────────────┐
│ │
↓ ↓
┌─────────────────────┐ ┌──────────────────────┐
│ Tree-sitter Backends│ │ Citrus Backend │
│ │ │ │
│ • MRI │ │ • Generic Citrus │
│ • Rust │ │ wrapper │
│ • FFI │ │ • Accepts any │
│ • Java │ │ Citrus grammar │
│ │ │ • NO TOML knowledge │
│ All use tree-sitter │ │ │
│ with language libs │ │ Uses Citrus parser │
└─────────────────────┘ └──────────────────────┘
↓ ↓
libtree-sitter-toml Citrus + grammar module
(native library) (provided by toml-rb)
What Each Layer Does
Layer 1: tree_haver (Generic Parser Interface)
Purpose: Provide unified API across different parsing backends
What it knows:
- ✅ How to create Node objects
- ✅ How to traverse ASTs
- ✅ How to extract positions/text
- ✅ How to switch backends
What it DOESN’T know:
- ❌ TOML syntax
- ❌ What a “table” is
- ❌ What a “key-value pair” is
- ❌ Any language-specific semantics
Example tree_haver API:
parser = TreeHaver::Parser.new
parser.language = some_grammar # Generic - works with ANY grammar
tree = parser.parse(source)
node = tree.root_node
node.type # => :table (just a symbol from grammar)
node.start_byte # => 0
node.text # => "[section]"
node.children # => [...]
# tree_haver doesn't know what :table means!
# It just provides the data.
Layer 2: toml-merge (TOML Semantics)
Purpose: Understand TOML structure and provide merge logic
What it knows:
- ✅ TOML syntax and semantics
- ✅ What :table means (it’s a TOML section)
- ✅ What :keyvalue/:pair means (TOML key-value)
- ✅ How to extract table names
- ✅ How to merge TOML files
What it DOESN’T know:
- ❌ Which backend tree_haver is using
- ❌ How parsing actually works
- ❌ Tree-sitter vs Citrus details
Example toml-merge API:
analysis = Toml::Merge::FileAnalysis.new(source)
# Internally uses tree_haver, doesn't care which backend
node = analysis.statements.first
node.table? # => true (TOML-specific method)
node.table_name # => "section" (TOML-specific extraction)
# toml-merge adds TOML understanding on top of tree_haver!
Backend Comparison
Tree-sitter Backends (MRI/Rust/FFI/Java)
# tree_haver loads tree-sitter grammar
TreeHaver::Language.load("toml", "/path/to/libtree-sitter-toml.so")
parser = TreeHaver::Parser.new
parser.language = language
tree = parser.parse(source)
# Returns nodes with types like:
# :table, :pair, :array, :string, etc.
# (from tree-sitter TOML grammar)
Citrus Backend (NEW)
# tree_haver accepts Citrus grammar module
parser = TreeHaver::Parser.new
parser.language = TomlRB::Document # Citrus grammar module
tree = parser.parse(source)
# Returns nodes with types like:
# :table, :keyvalue, :array, :string, etc.
# (from Citrus TOML grammar - slightly different names!)
Key Design Principle
tree_haver provides transport, toml-merge provides interpretation
Analogy: HTTP vs Web Application
HTTP (tree_haver):
- Transports bytes
- Doesn't know about JSON, HTML, etc.
- Just provides: headers, body, status
Web App (toml-merge):
- Interprets JSON/HTML
- Knows what data means
- Uses HTTP for transport
Same with parsing:
tree_haver:
- Transports AST nodes
- Doesn't know about TOML, JSON, etc.
- Just provides: type, text, children
toml-merge:
- Interprets TOML structure
- Knows what nodes mean
- Uses tree_haver for parsing
Implementation Plan - UPDATED
Phase 1: Refactor to use tree_haver (THIS FIRST!)
Goal: Make toml-merge use tree_haver’s existing backends
# BEFORE: Direct tree-sitter usage
require "tree_sitter"
language = TreeSitter::Language.load(...)
parser = TreeSitter::Parser.new
parser.language = language
# AFTER: Use tree_haver
require "tree_haver"
# tree_haver auto-detects best backend (MRI/Rust/FFI/Java)
# and handles language loading via GrammarFinder
parser = TreeHaver::Parser.new
parser.language = TreeHaver::Language.toml
Changes needed:
- Update FileAnalysis to use TreeHaver::Parser
- Update NodeWrapper to work with TreeHaver::Node
- Remove direct tree-sitter references
- Add tree_haver dependency
- Test with all tree_haver backends
Benefits:
- ✅ Works on JRuby (via FFI or Java backend)
- ✅ Works on TruffleRuby (via FFI)
- ✅ Can use Rust backend (tree_stump)
- ✅ Automatically picks best backend
- ✅ Sets foundation for Citrus backend
Phase 2: Add Citrus backend to tree_haver
Goal: Add Citrus as another backend option in tree_haver
In tree_haver:
# tree_haver/lib/tree_haver/backends/citrus.rb
module TreeHaver::Backends
module Citrus
class Node
def initialize(match, source)
@match = match # Generic Citrus::Match
@source = source
end
def type
# Extract from events[0] - GENERIC
@match.events.first.is_a?(Symbol) ? @match.events.first : :unknown
end
def start_byte; @match.offset; end
def end_byte; @match.offset + @match.length; end
def start_point; calculate_point(@match.offset); end
def end_point; calculate_point(@match.offset + @match.length); end
def text; @match.string; end
def children
@match.matches.map { |m| Node.new(m, @source) }
end
# NO TOML KNOWLEDGE!
end
class Parser
def initialize
@grammar = nil
end
# Accept any Citrus grammar module
def language=(grammar_module)
unless grammar_module.respond_to?(:parse)
raise ArgumentError, "Grammar must respond to :parse"
end
@grammar = grammar_module
end
def parse(source)
raise "No grammar loaded" unless @grammar
begin
parsed = @grammar.parse(source)
Tree.new(parsed, source)
rescue Citrus::ParseError => e
raise TreeHaver::ParseError, e.message
end
end
end
class Tree
def initialize(citrus_match, source)
@root = Node.new(citrus_match, source)
end
def root_node
@root
end
end
def self.available?
require "citrus"
true
rescue LoadError
false
end
def self.capabilities
{
backend: :citrus,
parse: true,
query: false, # Citrus doesn't have query API
bytes_field: true,
incremental: false,
}
end
end
end
Phase 3: Make toml-merge work with Citrus backend
Goal: toml-merge automatically works when tree_haver uses Citrus
In toml-merge:
# Gemfile
gem "tree_haver"
gem "toml-rb", optional: true # Only needed for Citrus backend
# lib/toml/merge/file_analysis.rb
class FileAnalysis
def initialize(source, **options)
@source = source
@parser = TreeHaver::Parser.new
# Load appropriate grammar based on backend
case TreeHaver.backend_module
when TreeHaver::Backends::Citrus
# Citrus backend: load toml-rb grammar
require "toml-rb"
@parser.language = TomlRB::Document
else
# Tree-sitter backends: use registered TOML language
@parser.language = TreeHaver::Language.toml
end
@tree = @parser.parse(source)
# ... rest works the same!
end
end
# lib/toml/merge/node_wrapper.rb
class NodeWrapper
def initialize(node, **options)
@node = node # TreeHaver::Node (works with any backend!)
# ... TOML-specific logic
end
def table?
# Handle slight differences in type names
case @node.type
when :table
true
when :array_of_tables, :table_array
false # Different type
else
false
end
end
def table_name
return unless table?
# Extract from node structure - works with both backends!
# tree-sitter and Citrus might have slightly different structures
# but both expose :table type with name information
extract_table_name_from_node(@node)
end
end
Type Name Mapping
The main challenge: tree-sitter and Citrus use slightly different names
Tree-sitter TOML grammar:
table [section]
array_of_tables [[items]]
pair key = value
string "value"
integer 42
array [1, 2, 3]
Citrus TOML grammar (toml-rb):
table [section]
table_array [[items]]
keyvalue key = value
basic_string "value"
integer 42
array [1, 2, 3]
Solution in toml-merge:
def normalize_type(type)
case type
when :keyvalue then :pair
when :table_array then :array_of_tables
when :basic_string, :literal_string then :string
else type
end
end
Dependencies
tree_haver
# tree_haver.gemspec
spec.add_dependency "citrus", "~> 3.0" # For Citrus backend
# All backends are optional:
# - ruby_tree_sitter (MRI)
# - tree_stump (Rust)
# - ffi (FFI)
# - java-tree-sitter (Java)
# - citrus (Citrus)
# tree_haver picks best available
toml-merge
# toml-merge.gemspec
spec.add_dependency "tree_haver", "~> 1.0"
spec.add_dependency "toml-rb", "~> 3.0" # For Citrus grammar
# toml-rb provides:
# 1. TOML Citrus grammar (TomlRB::Document)
# 2. Semantic layer (TomlRB::Table, etc.) - we might use this
Usage Examples
Auto-select (default)
require "toml-merge"
# tree_haver picks best backend:
# - MRI: ruby_tree_sitter (if available)
# - Rust: tree_stump (if available)
# - JRuby: java-tree-sitter or FFI
# - Fallback: Citrus (pure Ruby)
analysis = Toml::Merge::FileAnalysis.new(source)
Force Citrus (pure Ruby)
require "toml-merge"
TreeHaver.backend = :citrus
analysis = Toml::Merge::FileAnalysis.new(source)
Environment variable
export TREE_HAVER_BACKEND=citrus
ruby my_script.rb
Benefits of This Architecture
1. Clean Separation
- tree_haver: Generic parsing mechanics
- toml-merge: TOML-specific semantics
- No grammar knowledge in tree_haver!
2. Reusability
- tree_haver’s Citrus backend works for ANY Citrus grammar
- json-merge could use it with a JSON Citrus grammar
- yaml-merge could use it with a YAML Citrus grammar
- bash-merge could use it with a Bash Citrus grammar
3. Flexibility
- toml-merge works with ALL tree_haver backends
- Users can choose backend based on their needs
- Graceful fallback to pure Ruby
4. Maintainability
- tree_haver handles parsing infrastructure
- toml-merge focuses on TOML logic
- Clear boundaries
What Gets Built Where
tree_haver gains:
lib/tree_haver/backends/
citrus/
node.rb # Generic Citrus::Match wrapper
parser.rb # Generic grammar loading
tree.rb # Tree structure
point.rb # Position calculation
toml-merge keeps:
lib/toml/merge/
file_analysis.rb # Uses TreeHaver::Parser
node_wrapper.rb # TOML semantics on TreeHaver::Node
smart_merger.rb # TOML merge logic
conflict_resolver.rb # TOML conflict handling
Summary
Key Points:
- ✅ tree_haver remains grammar-agnostic
- ✅ Citrus backend in tree_haver is generic (no TOML knowledge)
- ✅ toml-merge provides TOML-specific logic (works with any backend)
- ✅ toml-rb provides the TOML Citrus grammar
- ✅ Architecture is clean and reusable
Implementation Order:
- FIRST: Refactor toml-merge to use tree_haver (replace direct tree-sitter)
- SECOND: Add Citrus backend to tree_haver (generic wrapper)
- THIRD: Ensure toml-merge works with Citrus backend
This is the correct architecture! 🎯