Knowledge Sources
Custom Knowledge Sources
Knowledge Sources
Knowledge Sources provide custom context that Cuppa references when generating content. Upload documents or paste text to give the AI specific information about your products, processes, or expertise that isn't available on the public web.
Why Knowledge Sources Matter
Public AI models only know what's in their training data. They don't know:
Your specific product features and pricing
Internal processes and methodologies
Proprietary research and data
Company policies and guidelines
Industry-specific terminology you use
Your customer support phone number
Knowledge Sources bridge this gap. When you add a Knowledge Source, Cuppa splits your content into searchable chunks, stores them as vector embeddings, and retrieves the most relevant pieces during article generation.
This is RAG (Retrieval-Augmented Generation) in action.
How It Works
When you upload a Knowledge Source:
Chunking: Your content is split into smaller pieces (roughly 1,000 characters each for files, 400 for text)
Embedding: Each chunk is converted into a vector embedding using OpenAI
Storage: Embeddings are stored in your team's knowledge base
Retrieval: During generation, Cuppa searches for chunks relevant to your topic and includes them in the AI prompt
Chunk Limits
Each Knowledge Source can store up to 100 chunks (approximately 50,000 characters or 12,500 tokens). If your document exceeds this limit:
The first 100 chunks are indexed
Remaining content is not searchable
You'll see a warning: "100 chunks indexed (limited from X)"
Tip: For large documents, split them into multiple focused Knowledge Sources for better coverage.
Source Types
Text
Paste content directly into Cuppa.
Best for:
Product descriptions
FAQ content
Style guidelines
Key messaging
Contact information
Boilerplate text
Example:
File Upload
Upload PDF or TXT files up to 50MB.
Best for:
Product documentation
White papers
Research reports
Employee handbooks
Training materials
Supported formats:
Text-based PDFs only. Scanned/image PDFs are not supported (no text to extract).
TXT
Plain text files
Markdown
.md files treated as text
Important: PDFs must contain actual text, not images of text. If you can't select/copy text in your PDF, it's image-based and won't work. Use a text-based export or OCR tool first.
Adding Knowledge Sources
Navigate to AI Instructions > Brand Knowledge
Click Create new knowledge source
Choose source type (Text or File)
Provide content and metadata:
Name: Descriptive name (e.g., "Product Features 2024")
Description: What this source contains
Click Save
After saving, you'll see indexing stats showing how many chunks were created.
Understanding Indexing Stats
After upload, each source displays:
"X chunks indexed": Your content was fully indexed
"X chunks indexed (limited from Y)": Content exceeded the 100-chunk limit
If limited, consider splitting the document into smaller, topic-focused sources.
What to Include
Focus on information the AI can't find elsewhere:
✅ Product specifics: Features, pricing, specifications, SKUs ✅ Contact information: Phone numbers, emails, addresses ✅ Brand guidelines: Terminology, messaging, values ✅ FAQs: Common questions with approved answers ✅ Case studies: Customer success stories with metrics ✅ Technical docs: How things work, integrations, specs ✅ Competitive positioning: How you differ from competitors ✅ Policies: Return policies, guarantees, terms
What NOT to Include
❌ Sensitive data: Passwords, API keys, personal customer information ❌ Massive documents: Split into focused topics instead ❌ Outdated information: Causes incorrect outputs ❌ Conflicting information: Creates inconsistent content ❌ Image-based PDFs: Scanned documents without extractable text
Best Practices
Keep Sources Focused
Smaller, topic-specific sources retrieve more accurately than massive documents. **Note, we allow multiple knowledge sources for brands, but only one selected per generation in terms of when you are building!
Product catalog
One source per product line
FAQ documents
Group by topic (billing, features, support)
Style guidelines
Single comprehensive source
Technical docs
Split by feature area
Use Descriptive Names
Good: "Enterprise Pricing 2026" or "Return Policy FAQ" Bad: "Document1" or "Info"
Names help you manage sources and help Cuppa understand context.
Update Regularly
Knowledge Sources reflect a point in time. Review quarterly:
Remove outdated sources
Update changed information
Add new products/features
Test Retrieval
After adding a source, test it in Agentic Chat:
"What is our phone number for customer support?"
If the answer is correct, your Knowledge Source is working.
How Knowledge Sources Are Used
During Article Generation
When generating content, Cuppa:
Analyzes your topic and keywords
Searches your Knowledge Sources for relevant information
Retrieves up to 12 of the most relevant chunks
Includes that context in the generation prompt
The AI sees your custom information alongside web research, creating content that's both current and accurate to your brand.
In Agentic Chat
Chat can access your Knowledge Sources directly:
Ask questions about your products
Request content using specific sources
Fact-check against your documentation
Example prompts:
"Using our pricing documentation, write a comparison table"
"What does our product guide say about the enterprise tier?"
"Draft an email using our approved messaging"
Source Management
Updating Content
Text
Edit directly in Cuppa
File
Delete and re-upload
Troubleshooting
"Content not being referenced"
Possible causes:
Content isn't semantically relevant to your topic
Similarity threshold not met
Solutions:
Ensure your content uses terminology related to your topic
Mention the source explicitly in chat for testing
"Wrong information being used"
Cause: Outdated or conflicting sources.
Solution: Audit sources, remove outdated content, resolve conflicts.
"0 chunks indexed"
Cause: PDF is image-based (scanned), not text-based.
Solution: Use a PDF with actual text, or convert with an OCR tool first. If you can't select/copy text in your PDF viewer, it's image-based.
"X chunks indexed (limited from Y)"
Cause: Document exceeded the 100-chunk limit.
Solution: Split into multiple smaller Knowledge Sources organized by topic.
"File upload failed"
Check:
File is under 50MB
Format is PDF, TXT, or Markdown
PDF isn't password-protected
PDF contains actual text (not scanned images)
Related Features
AI Instructions: Control generation settings and prompts
Brand Voice: Consistent tone and style
Agentic Chat: Chat that uses your knowledge
Last updated
Was this helpful?