Architecture Vision: Dual Backend System

Current State (Tree-sitter only)

toml-merge/
  lib/toml/merge/
    file_analysis.rb        # Directly uses TreeSitter
    node_wrapper.rb         # Wraps TreeSitter::Node
    smart_merger.rb
    conflict_resolver.rb

Dependencies:
  - tree_sitter (gem)
  - libtree-sitter-toml.so (native)

Problems:

  • Native library dependency
  • Installation can fail
  • JRuby incompatible (without complex FFI)
  • Limited platform support

Future State: Stage 1 (Dual Backend in toml-merge)

toml-merge/
  lib/toml/merge/
    config.rb                    # NEW: Backend selection
    file_analysis.rb             # UPDATED: Backend-aware
    node_wrapper.rb              # UPDATED: Backend-aware
    
    backends/                    # NEW: Backend system
      backend_adapter.rb         # Abstract interface
      
      tree_sitter/               # Existing code refactored
        parser.rb
        node_adapter.rb
        
      citrus/                    # NEW: Pure Ruby backend
        parser.rb                # Citrus::Document.parse
        match_wrapper.rb         # Generic Citrus mechanics (marked)
        node_adapter.rb          # TOML semantics

Usage:

# Auto-select (prefers tree-sitter)
analysis = FileAnalysis.new(source)

# Force Citrus (pure Ruby)
ENV["TOML_MERGE_BACKEND"] = "citrus"
analysis = FileAnalysis.new(source)

# Programmatic
Toml::Merge.backend = :citrus
analysis = FileAnalysis.new(source)

Benefits:

  • ✅ Works everywhere (fallback to pure Ruby)
  • ✅ Graceful degradation
  • ✅ No installation failures

Future State: Stage 2 (After Extraction)

tree_haver/                      # Generic Citrus support
  lib/tree_haver/
    backends/
      citrus/                    # EXTRACTED: Generic parts
        node.rb                  # Generic Citrus::Match wrapper
        parser.rb                # Grammar loading
        language.rb              # Grammar registration
        point.rb                 # Position calculation
        
toml-merge/                      # TOML-specific logic
  lib/toml/merge/
    backends/
      tree_sitter/               # Uses tree_haver (as before)
        adapter.rb
        
      citrus/                    # SIMPLIFIED: Uses tree_haver
        adapter.rb               # Only TOML semantics now!

tree_haver API:

# Load any Citrus grammar
language = TreeHaver::Language.from_citrus_grammar(
  path: "path/to/grammar.citrus",
  grammar_module: TomlRB::Document
)

parser = TreeHaver::Parser.new
parser.language = language
tree = parser.parse(source)

# Generic node interface (works for ANY grammar)
node = tree.root_node
node.type        # => :table (from grammar rule name)
node.start_byte  # => 0
node.end_byte    # => 23
node.start_point # => {row: 0, column: 0}
node.text        # => "[section]"
node.children    # => [...]

toml-merge usage:

# Same as Stage 1, but implementation simpler
analysis = FileAnalysis.new(source, backend: :citrus)

# Now powered by tree_haver's generic Citrus backend
node = analysis.statements.first
node.table?      # => true (TOML-specific method)
node.table_name  # => "section" (TOML-specific extraction)

Benefits:

  • ✅ All Stage 1 benefits
  • ✅ Plus: Cleaner code in toml-merge
  • ✅ Plus: Other gems can reuse tree_haver’s Citrus backend
  • ✅ Plus: Foundation for Citrus grammar ecosystem

Code Examples: How It Works

Stage 1: Backend Selection

# config.rb
module Toml::Merge
  class << self
    def backend
      @backend ||= ENV["TOML_MERGE_BACKEND"]&.to_sym || :auto
    end
    
    def backend=(name)
      @backend = name&.to_sym
    end
    
    def backend_module
      case backend
      when :tree_sitter
        Backends::TreeSitter
      when :citrus
        Backends::Citrus
      else # :auto
        if Backends::TreeSitter.available?
          Backends::TreeSitter
        elsif Backends::Citrus.available?
          Backends::Citrus
        else
          raise "No TOML parsing backend available"
        end
      end
    end
  end
end

Stage 1: Generic Citrus Wrapper (marked for extraction)

# backends/citrus/match_wrapper.rb

# GENERIC - Can move to tree_haver later
module Toml::Merge::Backends::Citrus
  class MatchWrapper
    def initialize(match, source)
      @match = match
      @source = source
    end
    
    # Type from events[0] - GENERIC
    def type
      return :unknown unless @match.respond_to?(:events)
      @match.events.first.is_a?(Symbol) ? @match.events.first : :unknown
    end
    
    # Position info - GENERIC
    def start_byte
      @match.offset
    end
    
    def end_byte
      @match.offset + @match.length
    end
    
    def start_point
      calculate_point(@match.offset)
    end
    
    def end_point
      calculate_point(@match.offset + @match.length)
    end
    
    # Text extraction - GENERIC
    def text
      @match.string
    end
    
    # Child traversal - GENERIC
    def children
      return [] unless @match.respond_to?(:matches)
      @match.matches.map { |m| MatchWrapper.new(m, @source) }
    end
    
    # Captures - GENERIC
    def captures
      @match.captures
    end
    
    private
    
    def calculate_point(offset)
      lines_before = @source[0...offset].count("\n")
      line_start = @source.rindex("\n", offset - 1) || -1
      column = offset - line_start - 1
      { row: lines_before, column: column }
    end
  end
end

Stage 1: TOML-Specific Adapter

# backends/citrus/node_adapter.rb

# TOML-SPECIFIC - Stays in toml-merge
module Toml::Merge::Backends::Citrus
  class NodeAdapter
    def initialize(wrapped_match)
      @wrapped = wrapped_match
      @match = wrapped_match.instance_variable_get(:@match)
    end
    
    # Delegate generic methods
    def type; @wrapped.type; end
    def start_byte; @wrapped.start_byte; end
    def end_byte; @wrapped.end_byte; end
    def text; @wrapped.text; end
    def children; @wrapped.children.map { |c| NodeAdapter.new(c) }; end
    
    # TOML-specific type checks
    def table?
      type == :table
    end
    
    def array_of_tables?
      type == :table_array
    end
    
    def pair?
      type == :keyvalue
    end
    
    # TOML-specific extraction
    def table_name
      return unless table? || array_of_tables?
      
      # Use toml-rb's semantic layer
      if @match.respond_to?(:value) && @match.value.respond_to?(:full_key)
        @match.value.full_key
      end
    end
    
    def key_name
      return unless pair?
      
      if @match.respond_to?(:value) && @match.value.respond_to?(:dotted_keys)
        @match.value.dotted_keys.join(".")
      end
    end
    
    def value_node
      return unless pair?
      
      # Get value from captures
      if @wrapped.captures[:v]
        val_match = @wrapped.captures[:v].first
        NodeAdapter.new(MatchWrapper.new(val_match, @wrapped.instance_variable_get(:@source)))
      end
    end
  end
end

Stage 2: After Extraction to tree_haver

# tree_haver/lib/tree_haver/backends/citrus/node.rb

module TreeHaver::Backends::Citrus
  class Node
    # Exact same code as MatchWrapper from Stage 1
    # Just moved location!
    
    def initialize(match, source)
      @match = match
      @source = source
    end
    
    def type
      @match.events.first.is_a?(Symbol) ? @match.events.first : :unknown
    end
    
    # ... all the generic methods
  end
end

# toml-merge/lib/toml/merge/backends/citrus/adapter.rb

module Toml::Merge::Backends::Citrus
  class Adapter
    def initialize(tree_haver_node)
      @node = tree_haver_node
      @match = tree_haver_node.instance_variable_get(:@match)
    end
    
    # Delegate generic methods to tree_haver
    def type; @node.type; end
    def start_byte; @node.start_byte; end
    def text; @node.text; end
    # ...
    
    # TOML-specific logic (same as Stage 1)
    def table?; type == :table; end
    def table_name; ... end
    def key_name; ... end
    # ...
  end
end

Signature Generation: Backend-Agnostic

# node_wrapper.rb (works with both backends)

class NodeWrapper
  def initialize(node, backend:, **options)
    @backend = backend
    
    case backend
    when :tree_sitter
      @adapter = Backends::TreeSitter::NodeAdapter.new(node)
    when :citrus
      @adapter = Backends::Citrus::NodeAdapter.new(node)
    end
  end
  
  def signature
    case @adapter.type
    when :table
      [:table, table_name]
    when :array_of_tables
      [:array_of_tables, table_name]
    when :keyvalue, :pair
      [:pair, key_name]
    # ... etc
    end
  end
  
  # All semantic methods work regardless of backend
  def table?; @adapter.table?; end
  def table_name; @adapter.table_name; end
  def key_name; @adapter.key_name; end
end

FileAnalysis: Backend Selection

# file_analysis.rb

class FileAnalysis
  def initialize(source, backend: nil, **options)
    @source = source
    @backend = backend || Toml::Merge.backend_module
    
    # Parse using selected backend
    @ast = @backend.parse(source)
    @statements = integrate_nodes
  end
  
  private
  
  def integrate_nodes
    case @backend
    when Backends::TreeSitter
      # Existing tree-sitter logic
      integrate_tree_sitter_nodes
    when Backends::Citrus
      # New citrus logic
      integrate_citrus_nodes
    end
  end
  
  def integrate_citrus_nodes
    return [] unless @ast
    
    result = []
    @ast.matches.each do |match|
      next if match.value.nil? # Skip whitespace
      
      wrapper = Backends::Citrus::NodeAdapter.new(
        Backends::Citrus::MatchWrapper.new(match, @source)
      )
      
      result << NodeWrapper.new(wrapper, backend: :citrus, source: @source)
    end
    
    result.sort_by { |node| node.start_line || 0 }
  end
end

Testing Strategy

Stage 1: Backend-Specific Tests

# spec/toml/merge/backends/tree_sitter_spec.rb
RSpec.describe Toml::Merge::Backends::TreeSitter do
  it "parses TOML correctly" do
    # tree-sitter specific tests
  end
end

# spec/toml/merge/backends/citrus_spec.rb
RSpec.describe Toml::Merge::Backends::Citrus do
  it "parses TOML correctly" do
    # citrus specific tests
  end
end

Stage 1: Shared Examples

# spec/toml/merge/backends/shared_examples.rb
RSpec.shared_examples "TOML backend" do |backend_name|
  before do
    Toml::Merge.backend = backend_name
  end
  
  it "parses tables" do
    source = "[section]\nkey = 'value'"
    analysis = FileAnalysis.new(source)
    
    expect(analysis.valid?).to be true
    expect(analysis.statements.size).to eq 2
    expect(analysis.statements.first.table?).to be true
  end
  
  it "generates correct signatures" do
    # Test signature generation works with both backends
  end
end

# Run against both backends
RSpec.describe "TreeSitter backend" do
  include_examples "TOML backend", :tree_sitter
end

RSpec.describe "Citrus backend" do
  include_examples "TOML backend", :citrus
end

Performance Expectations

Stage 1 Benchmarks

# benchmark/backends_comparison.rb

require "benchmark/ips"

toml_sample = File.read("fixtures/large.toml")

Benchmark.ips do |x|
  x.report("tree-sitter") do
    Toml::Merge.backend = :tree_sitter
    FileAnalysis.new(toml_sample)
  end
  
  x.report("citrus") do
    Toml::Merge.backend = :citrus
    FileAnalysis.new(toml_sample)
  end
  
  x.compare!
end

Expected Results:

  • tree-sitter: Faster (native C)
  • citrus: 2-5x slower (acceptable for fallback)

Acceptable If:

  • Citrus within 10x of tree-sitter
  • Both handle real-world files in < 1 second

Documentation Plan

Stage 1 README Updates

## Installation

### With tree-sitter (recommended)

Install the tree-sitter TOML parser:

    brew install tree-sitter-toml  # macOS
    apt-get install libtree-sitter-toml  # Linux

Then:

    gem install toml-merge

### Pure Ruby (fallback)

If tree-sitter installation fails, toml-merge automatically
falls back to a pure Ruby parser (slower but works everywhere):

    gem install toml-merge
    # Works out of the box!

### Selecting Backend

```ruby
# Auto (default): prefers tree-sitter, falls back to pure Ruby
analysis = Toml::Merge::FileAnalysis.new(source)

# Force pure Ruby (for JRuby, TruffleRuby, etc.)
ENV["TOML_MERGE_BACKEND"] = "citrus"
analysis = Toml::Merge::FileAnalysis.new(source)

# Programmatic
Toml::Merge.backend = :citrus

---

## Migration Path

### For Existing Users (Stage 1)

**No changes required!**

- tree-sitter backend works exactly as before
- New citrus backend is optional
- Auto-selection is seamless

### For New Users (Stage 1)

**Better experience:**

- Installation "just works" (fallback to pure Ruby)
- No native library troubleshooting
- Works on all platforms

### After Extraction (Stage 2)

**Still no breaking changes!**

- Same API
- Same behavior
- Just cleaner implementation

---

## Success Criteria

### Stage 1 Complete When:
- [x] Citrus backend implemented
- [x] All tests passing with both backends
- [x] Performance measured and acceptable
- [x] Documentation updated
- [x] Generic vs specific boundaries documented

### Stage 2 Complete When:
- [x] Generic code extracted to tree_haver
- [x] toml-merge simplified
- [x] All tests still passing
- [x] Documentation updated
- [x] Example for other *-merge gems

---

## Long-term Vision

### Other *-merge Gems Can Follow

json-merge/
backends/
tree_sitter/ # Native performance
citrus/ # Pure Ruby via tree_haver

yaml-merge/
backends/
tree_sitter/ # Native performance
citrus/ # Pure Ruby via tree_haver

bash-merge/
backends/
tree_sitter/ # Native performance
citrus/ # Pure Ruby via tree_haver
```

All reuse tree_haver’s infrastructure!


Conclusion

This architecture provides:

  1. Immediate value - Pure Ruby fallback for toml-merge
  2. Low risk - Staged approach validates before extraction
  3. Long-term value - Foundation for entire *-merge ecosystem
  4. Clean design - Proper separation of concerns
  5. Backward compatible - No breaking changes
  6. Future-proof - Easy to add more backends later

Start implementation now! 🚀