Architecture Vision: Dual Backend System
Current State (Tree-sitter only)
toml-merge/
lib/toml/merge/
file_analysis.rb # Directly uses TreeSitter
node_wrapper.rb # Wraps TreeSitter::Node
smart_merger.rb
conflict_resolver.rb
Dependencies:
- tree_sitter (gem)
- libtree-sitter-toml.so (native)
Problems:
- Native library dependency
- Installation can fail
- JRuby incompatible (without complex FFI)
- Limited platform support
Future State: Stage 1 (Dual Backend in toml-merge)
toml-merge/
lib/toml/merge/
config.rb # NEW: Backend selection
file_analysis.rb # UPDATED: Backend-aware
node_wrapper.rb # UPDATED: Backend-aware
backends/ # NEW: Backend system
backend_adapter.rb # Abstract interface
tree_sitter/ # Existing code refactored
parser.rb
node_adapter.rb
citrus/ # NEW: Pure Ruby backend
parser.rb # Citrus::Document.parse
match_wrapper.rb # Generic Citrus mechanics (marked)
node_adapter.rb # TOML semantics
Usage:
# Auto-select (prefers tree-sitter)
analysis = FileAnalysis.new(source)
# Force Citrus (pure Ruby)
ENV["TOML_MERGE_BACKEND"] = "citrus"
analysis = FileAnalysis.new(source)
# Programmatic
Toml::Merge.backend = :citrus
analysis = FileAnalysis.new(source)
Benefits:
- ✅ Works everywhere (fallback to pure Ruby)
- ✅ Graceful degradation
- ✅ No installation failures
Future State: Stage 2 (After Extraction)
tree_haver/ # Generic Citrus support
lib/tree_haver/
backends/
citrus/ # EXTRACTED: Generic parts
node.rb # Generic Citrus::Match wrapper
parser.rb # Grammar loading
language.rb # Grammar registration
point.rb # Position calculation
toml-merge/ # TOML-specific logic
lib/toml/merge/
backends/
tree_sitter/ # Uses tree_haver (as before)
adapter.rb
citrus/ # SIMPLIFIED: Uses tree_haver
adapter.rb # Only TOML semantics now!
tree_haver API:
# Load any Citrus grammar
language = TreeHaver::Language.from_citrus_grammar(
path: "path/to/grammar.citrus",
grammar_module: TomlRB::Document
)
parser = TreeHaver::Parser.new
parser.language = language
tree = parser.parse(source)
# Generic node interface (works for ANY grammar)
node = tree.root_node
node.type # => :table (from grammar rule name)
node.start_byte # => 0
node.end_byte # => 23
node.start_point # => {row: 0, column: 0}
node.text # => "[section]"
node.children # => [...]
toml-merge usage:
# Same as Stage 1, but implementation simpler
analysis = FileAnalysis.new(source, backend: :citrus)
# Now powered by tree_haver's generic Citrus backend
node = analysis.statements.first
node.table? # => true (TOML-specific method)
node.table_name # => "section" (TOML-specific extraction)
Benefits:
- ✅ All Stage 1 benefits
- ✅ Plus: Cleaner code in toml-merge
- ✅ Plus: Other gems can reuse tree_haver’s Citrus backend
- ✅ Plus: Foundation for Citrus grammar ecosystem
Code Examples: How It Works
Stage 1: Backend Selection
# config.rb
module Toml::Merge
class << self
def backend
@backend ||= ENV["TOML_MERGE_BACKEND"]&.to_sym || :auto
end
def backend=(name)
@backend = name&.to_sym
end
def backend_module
case backend
when :tree_sitter
Backends::TreeSitter
when :citrus
Backends::Citrus
else # :auto
if Backends::TreeSitter.available?
Backends::TreeSitter
elsif Backends::Citrus.available?
Backends::Citrus
else
raise "No TOML parsing backend available"
end
end
end
end
end
Stage 1: Generic Citrus Wrapper (marked for extraction)
# backends/citrus/match_wrapper.rb
# GENERIC - Can move to tree_haver later
module Toml::Merge::Backends::Citrus
class MatchWrapper
def initialize(match, source)
@match = match
@source = source
end
# Type from events[0] - GENERIC
def type
return :unknown unless @match.respond_to?(:events)
@match.events.first.is_a?(Symbol) ? @match.events.first : :unknown
end
# Position info - GENERIC
def start_byte
@match.offset
end
def end_byte
@match.offset + @match.length
end
def start_point
calculate_point(@match.offset)
end
def end_point
calculate_point(@match.offset + @match.length)
end
# Text extraction - GENERIC
def text
@match.string
end
# Child traversal - GENERIC
def children
return [] unless @match.respond_to?(:matches)
@match.matches.map { |m| MatchWrapper.new(m, @source) }
end
# Captures - GENERIC
def captures
@match.captures
end
private
def calculate_point(offset)
lines_before = @source[0...offset].count("\n")
line_start = @source.rindex("\n", offset - 1) || -1
column = offset - line_start - 1
{ row: lines_before, column: column }
end
end
end
Stage 1: TOML-Specific Adapter
# backends/citrus/node_adapter.rb
# TOML-SPECIFIC - Stays in toml-merge
module Toml::Merge::Backends::Citrus
class NodeAdapter
def initialize(wrapped_match)
@wrapped = wrapped_match
@match = wrapped_match.instance_variable_get(:@match)
end
# Delegate generic methods
def type; @wrapped.type; end
def start_byte; @wrapped.start_byte; end
def end_byte; @wrapped.end_byte; end
def text; @wrapped.text; end
def children; @wrapped.children.map { |c| NodeAdapter.new(c) }; end
# TOML-specific type checks
def table?
type == :table
end
def array_of_tables?
type == :table_array
end
def pair?
type == :keyvalue
end
# TOML-specific extraction
def table_name
return unless table? || array_of_tables?
# Use toml-rb's semantic layer
if @match.respond_to?(:value) && @match.value.respond_to?(:full_key)
@match.value.full_key
end
end
def key_name
return unless pair?
if @match.respond_to?(:value) && @match.value.respond_to?(:dotted_keys)
@match.value.dotted_keys.join(".")
end
end
def value_node
return unless pair?
# Get value from captures
if @wrapped.captures[:v]
val_match = @wrapped.captures[:v].first
NodeAdapter.new(MatchWrapper.new(val_match, @wrapped.instance_variable_get(:@source)))
end
end
end
end
Stage 2: After Extraction to tree_haver
# tree_haver/lib/tree_haver/backends/citrus/node.rb
module TreeHaver::Backends::Citrus
class Node
# Exact same code as MatchWrapper from Stage 1
# Just moved location!
def initialize(match, source)
@match = match
@source = source
end
def type
@match.events.first.is_a?(Symbol) ? @match.events.first : :unknown
end
# ... all the generic methods
end
end
# toml-merge/lib/toml/merge/backends/citrus/adapter.rb
module Toml::Merge::Backends::Citrus
class Adapter
def initialize(tree_haver_node)
@node = tree_haver_node
@match = tree_haver_node.instance_variable_get(:@match)
end
# Delegate generic methods to tree_haver
def type; @node.type; end
def start_byte; @node.start_byte; end
def text; @node.text; end
# ...
# TOML-specific logic (same as Stage 1)
def table?; type == :table; end
def table_name; ... end
def key_name; ... end
# ...
end
end
Signature Generation: Backend-Agnostic
# node_wrapper.rb (works with both backends)
class NodeWrapper
def initialize(node, backend:, **options)
@backend = backend
case backend
when :tree_sitter
@adapter = Backends::TreeSitter::NodeAdapter.new(node)
when :citrus
@adapter = Backends::Citrus::NodeAdapter.new(node)
end
end
def signature
case @adapter.type
when :table
[:table, table_name]
when :array_of_tables
[:array_of_tables, table_name]
when :keyvalue, :pair
[:pair, key_name]
# ... etc
end
end
# All semantic methods work regardless of backend
def table?; @adapter.table?; end
def table_name; @adapter.table_name; end
def key_name; @adapter.key_name; end
end
FileAnalysis: Backend Selection
# file_analysis.rb
class FileAnalysis
def initialize(source, backend: nil, **options)
@source = source
@backend = backend || Toml::Merge.backend_module
# Parse using selected backend
@ast = @backend.parse(source)
@statements = integrate_nodes
end
private
def integrate_nodes
case @backend
when Backends::TreeSitter
# Existing tree-sitter logic
integrate_tree_sitter_nodes
when Backends::Citrus
# New citrus logic
integrate_citrus_nodes
end
end
def integrate_citrus_nodes
return [] unless @ast
result = []
@ast.matches.each do |match|
next if match.value.nil? # Skip whitespace
wrapper = Backends::Citrus::NodeAdapter.new(
Backends::Citrus::MatchWrapper.new(match, @source)
)
result << NodeWrapper.new(wrapper, backend: :citrus, source: @source)
end
result.sort_by { |node| node.start_line || 0 }
end
end
Testing Strategy
Stage 1: Backend-Specific Tests
# spec/toml/merge/backends/tree_sitter_spec.rb
RSpec.describe Toml::Merge::Backends::TreeSitter do
it "parses TOML correctly" do
# tree-sitter specific tests
end
end
# spec/toml/merge/backends/citrus_spec.rb
RSpec.describe Toml::Merge::Backends::Citrus do
it "parses TOML correctly" do
# citrus specific tests
end
end
Stage 1: Shared Examples
# spec/toml/merge/backends/shared_examples.rb
RSpec.shared_examples "TOML backend" do |backend_name|
before do
Toml::Merge.backend = backend_name
end
it "parses tables" do
source = "[section]\nkey = 'value'"
analysis = FileAnalysis.new(source)
expect(analysis.valid?).to be true
expect(analysis.statements.size).to eq 2
expect(analysis.statements.first.table?).to be true
end
it "generates correct signatures" do
# Test signature generation works with both backends
end
end
# Run against both backends
RSpec.describe "TreeSitter backend" do
include_examples "TOML backend", :tree_sitter
end
RSpec.describe "Citrus backend" do
include_examples "TOML backend", :citrus
end
Performance Expectations
Stage 1 Benchmarks
# benchmark/backends_comparison.rb
require "benchmark/ips"
toml_sample = File.read("fixtures/large.toml")
Benchmark.ips do |x|
x.report("tree-sitter") do
Toml::Merge.backend = :tree_sitter
FileAnalysis.new(toml_sample)
end
x.report("citrus") do
Toml::Merge.backend = :citrus
FileAnalysis.new(toml_sample)
end
x.compare!
end
Expected Results:
- tree-sitter: Faster (native C)
- citrus: 2-5x slower (acceptable for fallback)
Acceptable If:
- Citrus within 10x of tree-sitter
- Both handle real-world files in < 1 second
Documentation Plan
Stage 1 README Updates
## Installation
### With tree-sitter (recommended)
Install the tree-sitter TOML parser:
brew install tree-sitter-toml # macOS
apt-get install libtree-sitter-toml # Linux
Then:
gem install toml-merge
### Pure Ruby (fallback)
If tree-sitter installation fails, toml-merge automatically
falls back to a pure Ruby parser (slower but works everywhere):
gem install toml-merge
# Works out of the box!
### Selecting Backend
```ruby
# Auto (default): prefers tree-sitter, falls back to pure Ruby
analysis = Toml::Merge::FileAnalysis.new(source)
# Force pure Ruby (for JRuby, TruffleRuby, etc.)
ENV["TOML_MERGE_BACKEND"] = "citrus"
analysis = Toml::Merge::FileAnalysis.new(source)
# Programmatic
Toml::Merge.backend = :citrus
---
## Migration Path
### For Existing Users (Stage 1)
**No changes required!**
- tree-sitter backend works exactly as before
- New citrus backend is optional
- Auto-selection is seamless
### For New Users (Stage 1)
**Better experience:**
- Installation "just works" (fallback to pure Ruby)
- No native library troubleshooting
- Works on all platforms
### After Extraction (Stage 2)
**Still no breaking changes!**
- Same API
- Same behavior
- Just cleaner implementation
---
## Success Criteria
### Stage 1 Complete When:
- [x] Citrus backend implemented
- [x] All tests passing with both backends
- [x] Performance measured and acceptable
- [x] Documentation updated
- [x] Generic vs specific boundaries documented
### Stage 2 Complete When:
- [x] Generic code extracted to tree_haver
- [x] toml-merge simplified
- [x] All tests still passing
- [x] Documentation updated
- [x] Example for other *-merge gems
---
## Long-term Vision
### Other *-merge Gems Can Follow
json-merge/
backends/
tree_sitter/ # Native performance
citrus/ # Pure Ruby via tree_haver
yaml-merge/
backends/
tree_sitter/ # Native performance
citrus/ # Pure Ruby via tree_haver
bash-merge/
backends/
tree_sitter/ # Native performance
citrus/ # Pure Ruby via tree_haver
```
All reuse tree_haver’s infrastructure!
Conclusion
This architecture provides:
- ✅ Immediate value - Pure Ruby fallback for toml-merge
- ✅ Low risk - Staged approach validates before extraction
- ✅ Long-term value - Foundation for entire *-merge ecosystem
- ✅ Clean design - Proper separation of concerns
- ✅ Backward compatible - No breaking changes
- ✅ Future-proof - Easy to add more backends later
Start implementation now! 🚀