Knowledge Sources

Knowledge Sources provide custom context that Cuppa references when generating content. Upload documents or paste text to give the AI specific information about your products, processes, or expertise that isn’t available on the public web.

Why Knowledge Sources Matter

Public AI models only know what’s in their training data. They don’t know:

Your specific product features and pricing
Internal processes and methodologies
Proprietary research and data
Company policies and guidelines
Industry-specific terminology you use
Your customer support phone number

Knowledge Sources bridge this gap. When you add a Knowledge Source, Cuppa splits your content into searchable chunks, stores them as vector embeddings, and retrieves the most relevant pieces during article generation.

This is RAG (Retrieval-Augmented Generation) in action.

How It Works

When you upload a Knowledge Source:

Chunking: Your content is split into smaller pieces (roughly 1,000 characters each for files, 400 for text)
Embedding: Each chunk is converted into a vector embedding using OpenAI
Storage: Embeddings are stored in your team’s knowledge base
Retrieval: During generation, Cuppa searches for chunks relevant to your topic and includes them in the AI prompt

Chunk Limits

Each Knowledge Source can store up to 100 chunks (approximately 50,000 characters or 12,500 tokens). If your document exceeds this limit:

The first 100 chunks are indexed
Remaining content is not searchable
You’ll see a warning: “100 chunks indexed (limited from X)”

Tip: For large documents, split them into multiple focused Knowledge Sources for better coverage.

Source Types

Text

Paste content directly into Cuppa.

Best for:

Product descriptions
FAQ content
Style guidelines
Key messaging
Contact information
Boilerplate text

Example:

Our flagship product, ContentFlow Pro, offers:
- Unlimited team seats ($99/month)
- AI-powered content optimization
- 50+ CMS integrations
- 24/7 priority support

Contact us: support@contentflow.com | 1-800-555-0123

Key differentiator: Only platform with real-time SEO scoring during editing.

File Upload

Upload PDF or TXT files up to 50MB.

Best for:

Product documentation
White papers
Research reports
Employee handbooks
Training materials

Supported formats:

Format	Notes
PDF	Text-based PDFs only. Scanned/image PDFs are not supported (no text to extract).
TXT	Plain text files
Markdown	.md files treated as text

Important: PDFs must contain actual text, not images of text. If you can’t select/copy text in your PDF, it’s image-based and won’t work. Use a text-based export or OCR tool first.

Adding Knowledge Sources

Navigate to AI Instructions > Brand Knowledge
Click Create new knowledge source
Choose source type (Text or File)
Provide content and metadata:
- Name: Descriptive name (e.g., “Product Features 2024”)
- Description: What this source contains
Click Save

After saving, you’ll see indexing stats showing how many chunks were created.

Understanding Indexing Stats

After upload, each source displays:

“X chunks indexed”: Your content was fully indexed
“X chunks indexed (limited from Y)”: Content exceeded the 100-chunk limit

If limited, consider splitting the document into smaller, topic-focused sources.

What to Include

Focus on information the AI can’t find elsewhere:

✅ Product specifics: Features, pricing, specifications, SKUs ✅ Contact information: Phone numbers, emails, addresses ✅ Brand guidelines: Terminology, messaging, values ✅ FAQs: Common questions with approved answers ✅ Case studies: Customer success stories with metrics ✅ Technical docs: How things work, integrations, specs ✅ Competitive positioning: How you differ from competitors ✅ Policies: Return policies, guarantees, terms

What NOT to Include

❌ Sensitive data: Passwords, API keys, personal customer information ❌ Massive documents: Split into focused topics instead ❌ Outdated information: Causes incorrect outputs ❌ Conflicting information: Creates inconsistent content ❌ Image-based PDFs: Scanned documents without extractable text

Best Practices

Keep Sources Focused

Smaller, topic-specific sources retrieve more accurately than massive documents. **Note, we allow multiple knowledge sources for brands, but only one selected per generation in terms of when you are building!

Content Type	Recommended Approach
Product catalog	One source per product line
FAQ documents	Group by topic (billing, features, support)
Style guidelines	Single comprehensive source
Technical docs	Split by feature area

Use Descriptive Names

Good: “Enterprise Pricing 2026” or “Return Policy FAQ” Bad: “Document1” or “Info”

Names help you manage sources and help Cuppa understand context.

Update Regularly

Knowledge Sources reflect a point in time. Review quarterly:

Remove outdated sources
Update changed information
Add new products/features

Test Retrieval

After adding a source, test it in Agentic Chat:

“What is our phone number for customer support?”

If the answer is correct, your Knowledge Source is working.

How Knowledge Sources Are Used

During Article Generation

When generating content, Cuppa:

Analyzes your topic and keywords
Searches your Knowledge Sources for relevant information
Retrieves up to 12 of the most relevant chunks
Includes that context in the generation prompt

The AI sees your custom information alongside web research, creating content that’s both current and accurate to your brand.

In Agentic Chat

Chat can access your Knowledge Sources directly:

Ask questions about your products
Request content using specific sources
Fact-check against your documentation

Example prompts:

“Using our pricing documentation, write a comparison table”
“What does our product guide say about the enterprise tier?”
“Draft an email using our approved messaging”

Source Management

Updating Content

Source Type	How to Update
Text	Edit directly in Cuppa
File	Delete and re-upload

Troubleshooting

”Content not being referenced”

Possible causes:

Content isn’t semantically relevant to your topic
Similarity threshold not met

Solutions:

Ensure your content uses terminology related to your topic
Mention the source explicitly in chat for testing

”Wrong information being used”

Cause: Outdated or conflicting sources.

Solution: Audit sources, remove outdated content, resolve conflicts.

”0 chunks indexed”

Cause: PDF is image-based (scanned), not text-based.

Solution: Use a PDF with actual text, or convert with an OCR tool first. If you can’t select/copy text in your PDF viewer, it’s image-based.

”X chunks indexed (limited from Y)”

Cause: Document exceeded the 100-chunk limit.

Solution: Split into multiple smaller Knowledge Sources organized by topic.

”File upload failed”

Check:

File is under 50MB
Format is PDF, TXT, or Markdown
PDF isn’t password-protected
PDF contains actual text (not scanned images)

AI Instructions: Control generation settings and prompts
Brand Voice: Consistent tone and style
Agentic Chat: Chat that uses your knowledge

AI Instructions (formerly Presets)Brand Skills

Knowledge Sources