Appearance
Knowledge Indexing Process
How to convert documentation into Emily's searchable knowledge base.
Overview
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Markdown │ │ Extraction │ │ Azure Search │
│ Documentation │ ──▶ │ & Translation │ ──▶ │ Vector Store │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ Emily │
│ Assistant │
└─────────────────┘Step 1: Documentation
All knowledge starts as markdown files in this repository:
upvendo-kb/
├── features/ # Feature documentation
├── fields/ # Field reference
├── relations/ # Cross-system relations
└── troubleshooting/ # Issue guidesDocumentation Standards
- Use consistent structure (see templates)
- Include all required sections
- Write for clarity (a child should understand)
- Provide examples
- Document customer impact
Step 2: Extraction
Scripts in upvendo-proxy-emily/scripts/knowledge-system/ extract knowledge:
Extract from Documentation
bash
# Parse markdown files into structured JSON
node scripts/knowledge-system/extract-from-docs.cjsThis creates entries for:
- Features (overview)
- Fields (individual field help)
- Business logic (how things work)
- Troubleshooting (problem/solution)
Extract from Source Code
bash
# Parse Vue/PHP files for field definitions
node scripts/knowledge-system/extract-all.cjsThis extracts:
- Field names and types
- Validation rules
- Default values
- Route paths
Step 3: Content Generation
For entries that need enrichment:
bash
# Generate detailed content using GPT-4
node scripts/knowledge-system/generate-content.cjsThis:
- Expands brief descriptions
- Adds examples
- Creates troubleshooting tips
- Ensures consistent quality
Step 4: Translation
Translate to all supported languages:
bash
# Translate to NL, FR, DE, ES, PT, IT
node scripts/knowledge-system/translate.cjsLanguages:
- English (EN) - Source
- Dutch (NL)
- French (FR)
- German (DE)
- Spanish (ES)
- Portuguese (PT)
- Italian (IT)
Step 5: Embedding Generation
Generate vector embeddings for semantic search:
bash
# Create embeddings using Azure OpenAI
node scripts/knowledge-system/generate-embeddings.cjsUses: text-embedding-ada-002 model
Step 6: Upload to Azure Search
Upload to the vector store:
bash
# Upload to Azure AI Search
node scripts/knowledge-system/upload-knowledge.cjsIndex: emily-kb-merchant-test
Full Pipeline
Run the complete pipeline:
bash
# Extract, generate, translate, embed, upload
node scripts/knowledge-system/full-pipeline.cjsOr with options:
bash
# Only process changed files
node scripts/knowledge-system/full-pipeline.cjs --incremental
# Full rebuild
node scripts/knowledge-system/full-pipeline.cjs --full
# Specific feature only
node scripts/knowledge-system/full-pipeline.cjs --feature order-capacityAzure Search Schema
Index Fields
| Field | Type | Purpose |
|---|---|---|
id | String | Unique identifier |
type | String | feature/field/workflow/troubleshooting |
name | String | English name |
name_nl | String | Dutch name |
name_fr | String | French name |
name_de | String | German name |
content | String | English content |
content_nl | String | Dutch content |
content_fr | String | French content |
content_de | String | German content |
category | String | Feature category |
route | String | Backoffice route |
tags | Array | Search keywords |
priority | Int | Ranking priority |
embedding | Vector | 1536-dim embedding |
Semantic Configuration
json
{
"semantic": {
"configurations": [{
"name": "default",
"prioritizedFields": {
"titleField": { "fieldName": "name" },
"contentFields": [{ "fieldName": "content" }]
}
}]
}
}Cost Estimates
| Operation | Cost |
|---|---|
| Initial full index (~1000 entries) | ~$15-25 |
| Incremental update (10 entries) | ~$0.10-0.20 |
| Emily chat query | ~$0.01-0.03 |
Automation
GitHub Action
On PR merge to main:
yaml
name: Update Emily Knowledge
on:
push:
branches: [main]
paths:
- 'features/**'
- 'fields/**'
- 'relations/**'
- 'troubleshooting/**'
jobs:
update:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run indexing
run: node scripts/full-pipeline.cjs --incrementalManual Trigger
bash
# From emily-knowledge-base directory
npm run index
# Or from proxy directory
npm run knowledge:updateVerification
After indexing, verify:
1. Check Entry Count
bash
curl -X GET "$AZURE_SEARCH_ENDPOINT/indexes/emily-kb-merchant-test/docs/\$count?api-version=2023-11-01" \
-H "api-key: $AZURE_SEARCH_API_KEY"2. Test Search
bash
curl -X POST "$AZURE_SEARCH_ENDPOINT/indexes/emily-kb-merchant-test/docs/search?api-version=2023-11-01" \
-H "Content-Type: application/json" \
-H "api-key: $AZURE_SEARCH_API_KEY" \
-d '{"search": "order capacity", "top": 5}'3. Test Emily
Ask Emily questions about the indexed content:
- "How do I set up order capacity?"
- "What does the time slot duration field do?"
- "Why are customers not seeing available times?"
Troubleshooting
Entries Not Appearing
- Check upload logs for errors
- Verify index exists
- Check field names match schema
- Wait for indexing (can take 1-2 minutes)
Search Not Finding Content
- Check embedding was generated
- Verify content is not empty
- Check semantic configuration
- Try different search terms
Translations Missing
- Check translation API key
- Verify source content exists
- Check for rate limiting
- Review translation logs