Skip to content

Knowledge Indexing Process

How to convert documentation into Emily's searchable knowledge base.

Overview

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   Markdown      │     │   Extraction    │     │  Azure Search   │
│   Documentation │ ──▶ │   & Translation │ ──▶ │   Vector Store  │
└─────────────────┘     └─────────────────┘     └─────────────────┘


                                                ┌─────────────────┐
                                                │     Emily       │
                                                │   Assistant     │
                                                └─────────────────┘

Step 1: Documentation

All knowledge starts as markdown files in this repository:

upvendo-kb/
├── features/           # Feature documentation
├── fields/             # Field reference
├── relations/          # Cross-system relations
└── troubleshooting/    # Issue guides

Documentation Standards

  1. Use consistent structure (see templates)
  2. Include all required sections
  3. Write for clarity (a child should understand)
  4. Provide examples
  5. Document customer impact

Step 2: Extraction

Scripts in upvendo-proxy-emily/scripts/knowledge-system/ extract knowledge:

Extract from Documentation

bash
# Parse markdown files into structured JSON
node scripts/knowledge-system/extract-from-docs.cjs

This creates entries for:

  • Features (overview)
  • Fields (individual field help)
  • Business logic (how things work)
  • Troubleshooting (problem/solution)

Extract from Source Code

bash
# Parse Vue/PHP files for field definitions
node scripts/knowledge-system/extract-all.cjs

This extracts:

  • Field names and types
  • Validation rules
  • Default values
  • Route paths

Step 3: Content Generation

For entries that need enrichment:

bash
# Generate detailed content using GPT-4
node scripts/knowledge-system/generate-content.cjs

This:

  • Expands brief descriptions
  • Adds examples
  • Creates troubleshooting tips
  • Ensures consistent quality

Step 4: Translation

Translate to all supported languages:

bash
# Translate to NL, FR, DE, ES, PT, IT
node scripts/knowledge-system/translate.cjs

Languages:

  • English (EN) - Source
  • Dutch (NL)
  • French (FR)
  • German (DE)
  • Spanish (ES)
  • Portuguese (PT)
  • Italian (IT)

Step 5: Embedding Generation

Generate vector embeddings for semantic search:

bash
# Create embeddings using Azure OpenAI
node scripts/knowledge-system/generate-embeddings.cjs

Uses: text-embedding-ada-002 model

Upload to the vector store:

bash
# Upload to Azure AI Search
node scripts/knowledge-system/upload-knowledge.cjs

Index: emily-kb-merchant-test

Full Pipeline

Run the complete pipeline:

bash
# Extract, generate, translate, embed, upload
node scripts/knowledge-system/full-pipeline.cjs

Or with options:

bash
# Only process changed files
node scripts/knowledge-system/full-pipeline.cjs --incremental

# Full rebuild
node scripts/knowledge-system/full-pipeline.cjs --full

# Specific feature only
node scripts/knowledge-system/full-pipeline.cjs --feature order-capacity

Azure Search Schema

Index Fields

FieldTypePurpose
idStringUnique identifier
typeStringfeature/field/workflow/troubleshooting
nameStringEnglish name
name_nlStringDutch name
name_frStringFrench name
name_deStringGerman name
contentStringEnglish content
content_nlStringDutch content
content_frStringFrench content
content_deStringGerman content
categoryStringFeature category
routeStringBackoffice route
tagsArraySearch keywords
priorityIntRanking priority
embeddingVector1536-dim embedding

Semantic Configuration

json
{
  "semantic": {
    "configurations": [{
      "name": "default",
      "prioritizedFields": {
        "titleField": { "fieldName": "name" },
        "contentFields": [{ "fieldName": "content" }]
      }
    }]
  }
}

Cost Estimates

OperationCost
Initial full index (~1000 entries)~$15-25
Incremental update (10 entries)~$0.10-0.20
Emily chat query~$0.01-0.03

Automation

GitHub Action

On PR merge to main:

yaml
name: Update Emily Knowledge
on:
  push:
    branches: [main]
    paths:
      - 'features/**'
      - 'fields/**'
      - 'relations/**'
      - 'troubleshooting/**'

jobs:
  update:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run indexing
        run: node scripts/full-pipeline.cjs --incremental

Manual Trigger

bash
# From emily-knowledge-base directory
npm run index

# Or from proxy directory
npm run knowledge:update

Verification

After indexing, verify:

1. Check Entry Count

bash
curl -X GET "$AZURE_SEARCH_ENDPOINT/indexes/emily-kb-merchant-test/docs/\$count?api-version=2023-11-01" \
  -H "api-key: $AZURE_SEARCH_API_KEY"
bash
curl -X POST "$AZURE_SEARCH_ENDPOINT/indexes/emily-kb-merchant-test/docs/search?api-version=2023-11-01" \
  -H "Content-Type: application/json" \
  -H "api-key: $AZURE_SEARCH_API_KEY" \
  -d '{"search": "order capacity", "top": 5}'

3. Test Emily

Ask Emily questions about the indexed content:

  • "How do I set up order capacity?"
  • "What does the time slot duration field do?"
  • "Why are customers not seeing available times?"

Troubleshooting

Entries Not Appearing

  1. Check upload logs for errors
  2. Verify index exists
  3. Check field names match schema
  4. Wait for indexing (can take 1-2 minutes)

Search Not Finding Content

  1. Check embedding was generated
  2. Verify content is not empty
  3. Check semantic configuration
  4. Try different search terms

Translations Missing

  1. Check translation API key
  2. Verify source content exists
  3. Check for rate limiting
  4. Review translation logs