Corrected Architecture: tree_haver + Citrus Backend

Key Correction

tree_haver remains completely grammar-agnostic!

tree_haver does NOT know about TOML, JSON, YAML, etc.
It only provides:

  1. Unified Node interface
  2. Backend abstraction (MRI/Rust/FFI/Java/Citrus)
  3. Generic grammar loading mechanism

Correct Architecture

┌─────────────────────────────────────────────────────────────────┐
│ toml-merge (TOML-SPECIFIC)                                      │
│                                                                  │
│ • Depends on toml-rb (TOML Citrus grammar)                     │
│ • Depends on tree_haver (unified interface)                     │
│ • Provides TOML semantics (table?, key_name, etc.)             │
│ • Works with ANY tree_haver backend                             │
└─────────────────────────────────────────────────────────────────┘
                               ↓
                    Uses tree_haver's API
                               ↓
┌─────────────────────────────────────────────────────────────────┐
│ tree_haver (GRAMMAR-AGNOSTIC)                                   │
│                                                                  │
│ • Unified Node interface (type, text, children, etc.)          │
│ • Backend selection (auto, mri, rust, ffi, java, citrus)       │
│ • Grammar loading abstraction                                   │
│ • NO knowledge of specific languages                            │
└─────────────────────────────────────────────────────────────────┘
                               ↓
              Delegates to selected backend
                               ↓
    ┌──────────────────────────────────────────────────┐
    │                                                   │
    ↓                                                   ↓
┌─────────────────────┐                   ┌──────────────────────┐
│ Tree-sitter Backends│                   │ Citrus Backend       │
│                     │                   │                      │
│ • MRI               │                   │ • Generic Citrus     │
│ • Rust              │                   │   wrapper            │
│ • FFI               │                   │ • Accepts any        │
│ • Java              │                   │   Citrus grammar     │
│                     │                   │ • NO TOML knowledge  │
│ All use tree-sitter │                   │                      │
│ with language libs  │                   │ Uses Citrus parser   │
└─────────────────────┘                   └──────────────────────┘
         ↓                                          ↓
  libtree-sitter-toml                      Citrus + grammar module
  (native library)                         (provided by toml-rb)

What Each Layer Does

Layer 1: tree_haver (Generic Parser Interface)

Purpose: Provide unified API across different parsing backends

What it knows:

  • ✅ How to create Node objects
  • ✅ How to traverse ASTs
  • ✅ How to extract positions/text
  • ✅ How to switch backends

What it DOESN’T know:

  • ❌ TOML syntax
  • ❌ What a “table” is
  • ❌ What a “key-value pair” is
  • ❌ Any language-specific semantics

Example tree_haver API:

parser = TreeHaver::Parser.new
parser.language = some_grammar  # Generic - works with ANY grammar
tree = parser.parse(source)

node = tree.root_node
node.type        # => :table (just a symbol from grammar)
node.start_byte  # => 0
node.text        # => "[section]"
node.children    # => [...]

# tree_haver doesn't know what :table means!
# It just provides the data.

Layer 2: toml-merge (TOML Semantics)

Purpose: Understand TOML structure and provide merge logic

What it knows:

  • ✅ TOML syntax and semantics
  • ✅ What :table means (it’s a TOML section)
  • ✅ What :keyvalue/:pair means (TOML key-value)
  • ✅ How to extract table names
  • ✅ How to merge TOML files

What it DOESN’T know:

  • ❌ Which backend tree_haver is using
  • ❌ How parsing actually works
  • ❌ Tree-sitter vs Citrus details

Example toml-merge API:

analysis = Toml::Merge::FileAnalysis.new(source)
# Internally uses tree_haver, doesn't care which backend

node = analysis.statements.first
node.table?       # => true (TOML-specific method)
node.table_name   # => "section" (TOML-specific extraction)

# toml-merge adds TOML understanding on top of tree_haver!

Backend Comparison

Tree-sitter Backends (MRI/Rust/FFI/Java)

# tree_haver loads tree-sitter grammar
TreeHaver::Language.load("toml", "/path/to/libtree-sitter-toml.so")
parser = TreeHaver::Parser.new
parser.language = language
tree = parser.parse(source)

# Returns nodes with types like:
# :table, :pair, :array, :string, etc.
# (from tree-sitter TOML grammar)

Citrus Backend (NEW)

# tree_haver accepts Citrus grammar module
parser = TreeHaver::Parser.new
parser.language = TomlRB::Document  # Citrus grammar module
tree = parser.parse(source)

# Returns nodes with types like:
# :table, :keyvalue, :array, :string, etc.
# (from Citrus TOML grammar - slightly different names!)

Key Design Principle

tree_haver provides transport, toml-merge provides interpretation

Analogy: HTTP vs Web Application

HTTP (tree_haver):
  - Transports bytes
  - Doesn't know about JSON, HTML, etc.
  - Just provides: headers, body, status

Web App (toml-merge):
  - Interprets JSON/HTML
  - Knows what data means
  - Uses HTTP for transport

Same with parsing:

tree_haver:
  - Transports AST nodes
  - Doesn't know about TOML, JSON, etc.
  - Just provides: type, text, children

toml-merge:
  - Interprets TOML structure
  - Knows what nodes mean
  - Uses tree_haver for parsing

Implementation Plan - UPDATED

Phase 1: Refactor to use tree_haver (THIS FIRST!)

Goal: Make toml-merge use tree_haver’s existing backends

# BEFORE: Direct tree-sitter usage
require "tree_sitter"
language = TreeSitter::Language.load(...)
parser = TreeSitter::Parser.new
parser.language = language

# AFTER: Use tree_haver
require "tree_haver"
# tree_haver auto-detects best backend (MRI/Rust/FFI/Java)
# and handles language loading via GrammarFinder
parser = TreeHaver::Parser.new
parser.language = TreeHaver::Language.toml

Changes needed:

  1. Update FileAnalysis to use TreeHaver::Parser
  2. Update NodeWrapper to work with TreeHaver::Node
  3. Remove direct tree-sitter references
  4. Add tree_haver dependency
  5. Test with all tree_haver backends

Benefits:

  • ✅ Works on JRuby (via FFI or Java backend)
  • ✅ Works on TruffleRuby (via FFI)
  • ✅ Can use Rust backend (tree_stump)
  • ✅ Automatically picks best backend
  • ✅ Sets foundation for Citrus backend

Phase 2: Add Citrus backend to tree_haver

Goal: Add Citrus as another backend option in tree_haver

In tree_haver:

# tree_haver/lib/tree_haver/backends/citrus.rb
module TreeHaver::Backends
  module Citrus
    class Node
      def initialize(match, source)
        @match = match  # Generic Citrus::Match
        @source = source
      end
      
      def type
        # Extract from events[0] - GENERIC
        @match.events.first.is_a?(Symbol) ? @match.events.first : :unknown
      end
      
      def start_byte; @match.offset; end
      def end_byte; @match.offset + @match.length; end
      def start_point; calculate_point(@match.offset); end
      def end_point; calculate_point(@match.offset + @match.length); end
      def text; @match.string; end
      
      def children
        @match.matches.map { |m| Node.new(m, @source) }
      end
      
      # NO TOML KNOWLEDGE!
    end
    
    class Parser
      def initialize
        @grammar = nil
      end
      
      # Accept any Citrus grammar module
      def language=(grammar_module)
        unless grammar_module.respond_to?(:parse)
          raise ArgumentError, "Grammar must respond to :parse"
        end
        @grammar = grammar_module
      end
      
      def parse(source)
        raise "No grammar loaded" unless @grammar
        
        begin
          parsed = @grammar.parse(source)
          Tree.new(parsed, source)
        rescue Citrus::ParseError => e
          raise TreeHaver::ParseError, e.message
        end
      end
    end
    
    class Tree
      def initialize(citrus_match, source)
        @root = Node.new(citrus_match, source)
      end
      
      def root_node
        @root
      end
    end
    
    def self.available?
      require "citrus"
      true
    rescue LoadError
      false
    end
    
    def self.capabilities
      {
        backend: :citrus,
        parse: true,
        query: false,  # Citrus doesn't have query API
        bytes_field: true,
        incremental: false,
      }
    end
  end
end

Phase 3: Make toml-merge work with Citrus backend

Goal: toml-merge automatically works when tree_haver uses Citrus

In toml-merge:

# Gemfile
gem "tree_haver"
gem "toml-rb", optional: true  # Only needed for Citrus backend

# lib/toml/merge/file_analysis.rb
class FileAnalysis
  def initialize(source, **options)
    @source = source
    @parser = TreeHaver::Parser.new
    
    # Load appropriate grammar based on backend
    case TreeHaver.backend_module
    when TreeHaver::Backends::Citrus
      # Citrus backend: load toml-rb grammar
      require "toml-rb"
      @parser.language = TomlRB::Document
    else
      # Tree-sitter backends: use registered TOML language
      @parser.language = TreeHaver::Language.toml
    end
    
    @tree = @parser.parse(source)
    # ... rest works the same!
  end
end

# lib/toml/merge/node_wrapper.rb
class NodeWrapper
  def initialize(node, **options)
    @node = node  # TreeHaver::Node (works with any backend!)
    # ... TOML-specific logic
  end
  
  def table?
    # Handle slight differences in type names
    case @node.type
    when :table
      true
    when :array_of_tables, :table_array
      false  # Different type
    else
      false
    end
  end
  
  def table_name
    return unless table?
    
    # Extract from node structure - works with both backends!
    # tree-sitter and Citrus might have slightly different structures
    # but both expose :table type with name information
    extract_table_name_from_node(@node)
  end
end

Type Name Mapping

The main challenge: tree-sitter and Citrus use slightly different names

Tree-sitter TOML grammar:

table              [section]
array_of_tables    [[items]]
pair               key = value
string             "value"
integer            42
array              [1, 2, 3]

Citrus TOML grammar (toml-rb):

table              [section]
table_array        [[items]]
keyvalue           key = value
basic_string       "value"
integer            42
array              [1, 2, 3]

Solution in toml-merge:

def normalize_type(type)
  case type
  when :keyvalue then :pair
  when :table_array then :array_of_tables
  when :basic_string, :literal_string then :string
  else type
  end
end

Dependencies

tree_haver

# tree_haver.gemspec
spec.add_dependency "citrus", "~> 3.0"  # For Citrus backend

# All backends are optional:
# - ruby_tree_sitter (MRI)
# - tree_stump (Rust)
# - ffi (FFI)
# - java-tree-sitter (Java)
# - citrus (Citrus)

# tree_haver picks best available

toml-merge

# toml-merge.gemspec
spec.add_dependency "tree_haver", "~> 1.0"
spec.add_dependency "toml-rb", "~> 3.0"  # For Citrus grammar

# toml-rb provides:
# 1. TOML Citrus grammar (TomlRB::Document)
# 2. Semantic layer (TomlRB::Table, etc.) - we might use this

Usage Examples

Auto-select (default)

require "toml-merge"

# tree_haver picks best backend:
# - MRI: ruby_tree_sitter (if available)
# - Rust: tree_stump (if available)
# - JRuby: java-tree-sitter or FFI
# - Fallback: Citrus (pure Ruby)

analysis = Toml::Merge::FileAnalysis.new(source)

Force Citrus (pure Ruby)

require "toml-merge"

TreeHaver.backend = :citrus
analysis = Toml::Merge::FileAnalysis.new(source)

Environment variable

export TREE_HAVER_BACKEND=citrus
ruby my_script.rb

Benefits of This Architecture

1. Clean Separation

  • tree_haver: Generic parsing mechanics
  • toml-merge: TOML-specific semantics
  • No grammar knowledge in tree_haver!

2. Reusability

  • tree_haver’s Citrus backend works for ANY Citrus grammar
  • json-merge could use it with a JSON Citrus grammar
  • yaml-merge could use it with a YAML Citrus grammar
  • bash-merge could use it with a Bash Citrus grammar

3. Flexibility

  • toml-merge works with ALL tree_haver backends
  • Users can choose backend based on their needs
  • Graceful fallback to pure Ruby

4. Maintainability

  • tree_haver handles parsing infrastructure
  • toml-merge focuses on TOML logic
  • Clear boundaries

What Gets Built Where

tree_haver gains:

lib/tree_haver/backends/
  citrus/
    node.rb         # Generic Citrus::Match wrapper
    parser.rb       # Generic grammar loading
    tree.rb         # Tree structure
    point.rb        # Position calculation

toml-merge keeps:

lib/toml/merge/
  file_analysis.rb     # Uses TreeHaver::Parser
  node_wrapper.rb      # TOML semantics on TreeHaver::Node
  smart_merger.rb      # TOML merge logic
  conflict_resolver.rb # TOML conflict handling

Summary

Key Points:

  1. ✅ tree_haver remains grammar-agnostic
  2. ✅ Citrus backend in tree_haver is generic (no TOML knowledge)
  3. ✅ toml-merge provides TOML-specific logic (works with any backend)
  4. ✅ toml-rb provides the TOML Citrus grammar
  5. ✅ Architecture is clean and reusable

Implementation Order:

  1. FIRST: Refactor toml-merge to use tree_haver (replace direct tree-sitter)
  2. SECOND: Add Citrus backend to tree_haver (generic wrapper)
  3. THIRD: Ensure toml-merge works with Citrus backend

This is the correct architecture! 🎯