Skip to content

Annotations

Tursio provides mechanisms to add annotations or rules to the semantic knowledge graph. These guide the query engine in selecting and using the schema appropriately. Annotations are similar to hints in traditional query engines — they influence how the LLM interprets and generates SQL queries.

Overview

Column annotation rules can be specified within column descriptions. They help the query engine:

  • Prioritize columns when multiple options exist
  • Enforce specific column usage for business terms
  • Handle hierarchical relationships between columns
  • Document general column information without affecting query generation directly

Rule Types

Prioritization Rules

Syntax: prioritization_rule: <rule_text>

Defines how to choose one column over another when multiple column choices are available. Use this to guide the selection process based on criteria like granularity, completeness, or relevance.

Example:

"prioritization_rule: prefer this column over major_category when
available, as it has finer granularity."

Use Cases:

  • Prefer finer granularity columns in filtering
  • Prefer more complete or accurate columns
  • Prefer columns with better data quality

Enforcer Rules

Syntax: enforcer_rule: <rule_text>

Defines which specific column to use when certain business terms are mentioned in a question. These rules must be followed strictly by the query engine.

Example:

"enforcer_rule: use this column when the question mentions
'category code' or 'category identifier'. The category code is in strict form
of 6 alphanumeric characters."

Use Cases:

  • Map business terminology to specific columns
  • Enforce column usage for specific business concepts
  • Ensure consistent column selection for domain-specific terms

Hierarchical Rules

Syntax: hierarchical_rule: <rule_text>

Defines how to handle hierarchical columns (e.g., major_category, sub_category, category_description). These rules specify how to filter and combine hierarchical levels.

Example:

"hierarchical_rule: this is part of a hierarchy with
major_category and category_description. sub_category is finer granularity
compared to major_category but less granularity compared to category_description.
Use finer granularity to filter if filter value is specific. Use AND conditions
when filtering multiple levels when needed, not OR."

Use Cases:

  • Define hierarchical relationships between columns
  • Specify filtering behavior (AND vs OR) for multiple levels
  • Guide granularity selection in hierarchies

General Description

Syntax: general_description: <description_text>

Provides general information about a column that does not affect query generation directly. This content is used for context purposes only.

Example:

"general_description: This column contains the human-readable
category names."

Note: Content within general_description: blocks is completely ignored during rule extraction and will not influence query generation.

Multiple Rules

You can specify multiple rules in a single column description by including multiple rule keywords. Rules are separated by their keywords and can appear in any order.

Example:

"hierarchical_rule: this is part of a hierarchy with major_category and
category_description. Use AND conditions when filtering multiple levels, not OR.
prioritization_rule: prefer this over major_category when available."

Syntax Rules:

  • Rule keywords must be followed by a colon (:) and a space
  • Rules are case-insensitive
  • Rule text is extracted from the keyword until the next rule keyword or end of string

Best Practices

  1. Be Specific: Provide clear, actionable guidance in rule text.
  2. Use Hierarchical Rules: When columns form a hierarchy, use hierarchical_rule to define relationships and filtering behavior.
  3. Combine Rules: Use multiple rule types together when needed (e.g., both prioritization and hierarchical rules).
  4. Test Rules: Verify that rules produce the expected query behavior.

Examples

Example 1: Hierarchical Columns

{
"major_category": "general_description: Top-level category classification. hierarchical_rule: this is the coarsest level in the hierarchy",
"sub_category": "hierarchical_rule: this is part of a hierarchy with major_category and description. Prioritize finer granularity. Use AND conditions when filtering multiple levels, not OR. prioritization_rule: prefer this over major_category when available",
"description": "hierarchical_rule: this is the finest granularity level in the hierarchy. prioritization_rule: prefer this when the finest level is needed"
}

Example 2: Business Term Mapping

{
"revenue_amount": "enforcer_rule: use this column when the question mentions 'revenue', 'sales', or 'income'",
"cost_amount": "enforcer_rule: use this column when the question mentions 'cost', 'expense', or 'expenditure'"
}

Example 3: Data Quality Prioritization

{
"customer_code": "prioritization_rule: use this for filtering specific customers. Don't use customer_id as it is not populated in many cases."
}