Content Core

Community

lfnovo

Extract what matters from any media source

Publisher	lfnovo
Repository	`content-core`
Language	Python
Forks	28
Stars	149
Available tools	1
Transport type	stdio
Categories	AI Ml Productivity
License	MIT
Links	GitHub

Connect tools to AI workflows
Content Core exposes MCP capabilities that can be used by compatible AI clients and agents.
1 available tools
Browse the callable actions below, including names and descriptions when provided by the server.
Ready-to-copy setup
Use the installation snippets to configure this server in your preferred MCP client.
Open source signals
149 stars and 28 forks from the linked repository.

Content Core

Extract, process, and summarize content from URLs, files, and text through a unified async Python API, CLI, or MCP server.

Supported Formats

Category	Formats
Web	URLs, HTML pages, YouTube videos, Reddit posts
Documents	PDF, DOCX, PPTX, XLSX, EPUB, Markdown, plain text
Media	MP3, WAV, M4A, FLAC, OGG (audio); MP4, AVI, MOV, MKV (video)

Quick Start

bash
pip install content-core

python
import content_core

result = await content_core.extract_content(url="https://example.com")
print(result.content)

Or with zero install:

bash
uvx content-core extract "https://example.com"

CLI Usage

Content Core provides a unified content-core command with subcommands for extraction, summarization, and MCP server.

Extract

bash
# From a URL
content-core extract "https://example.com"

# From a file
content-core extract document.pdf

# With JSON output
content-core extract document.pdf --format json

# With a specific engine
content-core extract "https://example.com" --engine firecrawl

# From stdin
echo "some text" | content-core extract

Summarize

bash
# Summarize text
content-core summarize "Long article text here..."

# With context
content-core summarize "Long text" --context "bullet points"

# From stdin
cat article.txt | content-core summarize --context "explain to a child"

MCP Server

bash
content-core mcp

Configuration

bash
# Set persistent config
content-core config set llm_provider anthropic
content-core config set llm_model claude-sonnet-4-20250514

# List current config
content-core config list

# Delete a config value
content-core config delete llm_provider

Config is stored in ~/.content-core/config.toml. Priority: command flags > env vars > config file > defaults.

Zero-Install with uvx

All commands work without installation using uvx:

bash
uvx content-core extract "https://example.com"
uvx content-core summarize "text" --context "one sentence"
uvx content-core mcp

Python API

Extraction

python
import content_core

# From a URL
result = await content_core.extract_content(url="https://example.com")

# From a file
result = await content_core.extract_content(file_path="document.pdf")

# From text
result = await content_core.extract_content(content="some text")

# With engine override
from content_core import ContentCoreConfig
config = ContentCoreConfig(url_engine="firecrawl")
result = await content_core.extract_content(url="https://example.com", config=config)

Summarization

python
import content_core

summary = await content_core.summarize("long article text", context="bullet points")

Configuration

python
from content_core import ContentCoreConfig

config = ContentCoreConfig(
    url_engine="firecrawl",
    document_engine="docling",
    audio_concurrency=5,
)
result = await content_core.extract_content(url="https://example.com", config=config)

MCP Integration

Content Core includes a Model Context Protocol (MCP) server for use with Claude Desktop and other MCP-compatible applications.

Add to your claude_desktop_config.json:

json
{
  "mcpServers": {
    "content-core": {
      "command": "uvx",
      "args": ["content-core", "mcp"],
      "env": {
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

The MCP server exposes two tools: extract_content and summarize_content. Both return plain text.

For detailed setup, see the MCP documentation.

Claude Code Skill

Content Core includes a SKILL.md that teaches AI agents how to use it for extracting content from external sources. To make it available in your Claude Code project, copy it to your skills directory:

bash
# Download the skill
curl -o .claude/skills/content-core/SKILL.md --create-dirs \
  https://raw.githubusercontent.com/lfnovo/content-core/main/SKILL.md

Once installed, Claude Code can use content-core to extract content from URLs, documents, and media files — either via CLI (uvx content-core) or MCP if configured.

AI Providers

Content Core uses Esperanto to support multiple LLM and STT providers. Switch providers by changing the config — no code changes needed:

bash
# Use Anthropic for summarization
content-core config set llm_provider anthropic
content-core config set llm_model claude-sonnet-4-20250514

# Use Groq for transcription
content-core config set stt_provider groq
content-core config set stt_model whisper-large-v3

Supported providers include OpenAI, Anthropic, Google, Groq, DeepSeek, Ollama, and more. See the Esperanto documentation for the full list.

Configuration

Content Core uses ContentCoreConfig powered by pydantic-settings. Settings are resolved in priority order: constructor args > env vars (CCORE_*) > config file (~/.content-core/config.toml) > defaults.

Environment Variables

Variable	Description	Default
`CCORE_URL_ENGINE`	URL extraction engine (`auto`, `simple`, `firecrawl`, `jina`, `crawl4ai`)	`auto`
`CCORE_DOCUMENT_ENGINE`	Document extraction engine (`auto`, `simple`, `docling`)	`auto`
`CCORE_AUDIO_CONCURRENCY`	Concurrent audio transcriptions (1-10)	`3`
`CRAWL4AI_API_URL`	Crawl4AI Docker API URL (omit for local browser mode)	-
`FIRECRAWL_API_URL`	Custom Firecrawl API URL for self-hosted instances	-
`CCORE_FIRECRAWL_PROXY`	Firecrawl proxy mode (`auto`, `basic`, `stealth`)	`auto`
`CCORE_FIRECRAWL_WAIT_FOR`	Wait time in ms before extraction	`3000`
`CCORE_LLM_PROVIDER`	LLM provider for summarization	-
`CCORE_LLM_MODEL`	LLM model for summarization	-
`CCORE_STT_PROVIDER`	Speech-to-text provider	-
`CCORE_STT_MODEL`	Speech-to-text model	-
`CCORE_STT_TIMEOUT`	Speech-to-text timeout in seconds	-
`CCORE_YOUTUBE_LANGUAGES`	Preferred YouTube transcript languages	-

API keys for external services are set via their standard environment variables (e.g., OPENAI_API_KEY, FIRECRAWL_API_KEY, JINA_API_KEY).

Proxy Configuration

Content Core reads standard HTTP_PROXY / HTTPS_PROXY / NO_PROXY environment variables automatically. No additional configuration is needed.

Optional Dependencies

bash
# Docling for advanced document parsing (PDF, DOCX, PPTX, XLSX)
pip install content-core[docling]

# Crawl4AI for local browser-based URL extraction
pip install content-core[crawl4ai]
python -m playwright install --with-deps

# LangChain tool wrappers
pip install content-core[langchain]

# All optional features
pip install content-core[docling,crawl4ai,langchain]

Using with LangChain

When installed with the langchain extra, Content Core provides LangChain-compatible tool wrappers:

python
from content_core.tools import extract_content_tool, summarize_content_tool

tools = [extract_content_tool, summarize_content_tool]

Documentation

Usage Guide -- Python API details, configuration, and examples
Processors -- How content extraction works for each format
MCP Server -- Claude Desktop and MCP integration

Development

bash
git clone https://github.com/lfnovo/content-core
cd content-core

uv sync --group dev

# Run tests
make test

# Lint
make ruff

License

This project is licensed under the MIT License.

Contributing

Contributions are welcome! Please see our Contributing Guide for details.

Installation

TypingMind

Prerequisites:

Node.js 18+

{
  "mcpServers": {
    "content-core": {
      "command": "uvx",
      "args": [
        "--from",
        "content-core",
        "content-core-mcp"
      ]
    }
  }
}

Available Tools

extract_content
Extract content from a URL or file using Content Core's auto engine.

Args: url: Optional URL to extract content from file_path: Optional file path to extract content from

Returns: JSON object containing extracted content and metadata

Raises: ValueError: If neither or both url and file_path are provided

Use Content Core MCP with multiple AI models

TypingMind connects MCP tools at the workspace level, so once Content Core is connected, you can use it with different AI models in TypingMind instead of setting it up separately for each model. This MCP runs locally through the TypingMind MCP connector on your device.

Setup guide to use the local connector

Use this when the MCP server needs access to local files, apps, or private resources on your computer.

Open the MCP settings

In TypingMind, go to Settings, Advanced Settings, then Model Context Protocol and choose Setup Connector.

Open TypingMind in your browser.
Click the Settings icon.
Go to Advanced Settings.
Open the Model Context Protocol section.
Click Setup Connector and choose This Device.

TypingMind MCP connector setup screen with This Device selected

Run the connector command

Choose This Device, copy the command from TypingMind, and run it in Terminal. Keep the process running while you use MCP.

Copy the setup command shown by TypingMind.
Open Terminal on macOS or Windows Terminal on Windows.
Paste and run the command.
Approve the package install if Terminal asks you to proceed.
Keep the Terminal window running while using MCP tools.

Add Content Core as a server

When the connector status is Ready, click Edit Servers and paste the MCP server configuration.

Wait until the connector status shows Ready.
Click Edit Servers.
Paste the Content Core MCP server configuration.
Save the server list.
Refresh if you want to confirm the connector is still ready.

TypingMind MCP settings showing active server and Edit Servers button

Use it across models

Save the server list, open Plugins, enable the Content Core MCP tools, then select any supported AI model in TypingMind and use the tools in chat or assign them to an AI agent.

Open the Plugins page in TypingMind.
Enable the Content Core MCP tools.
Start a chat and choose the AI model you want to use.
Use the MCP tools in chat or assign them to an AI agent.
Switch to another AI model whenever needed without reconnecting MCP.

TypingMind chat using enabled MCP tools with a selected AI model

Frequently asked questions

What is the Content Core MCP server used for?

Content Core is an MCP server that lets compatible AI clients connect to external tools and context. In TypingMind, you can add this MCP server once and make its tools available in your AI workspace.

Can I use Content Core MCP with multiple AI models in TypingMind?

Yes. TypingMind connects MCP tools at the workspace level, so you can use Content Core with different AI models such as Claude, ChatGPT, Gemini, or other models you have configured in TypingMind without setting up the MCP server separately for each model.

Why use Content Core MCP with TypingMind?

TypingMind is one of the best frontends for LLM chat because it brings multiple AI models, prompts, plugins, AI agents, API keys, and MCP tools into one workspace. With Content Core connected, you can use its MCP tools across your preferred models while keeping your chat workflow organized in TypingMind.

How do I connect Content Core MCP to TypingMind?

Content Core runs through the TypingMind local MCP connector. This is best when the MCP server needs access to local files, desktop apps, command-line tools, or private resources on your computer.

What tools does Content Core MCP provide in TypingMind?

Content Core exposes 1 MCP tools that can be enabled from the TypingMind Plugins page and used in chat or assigned to AI agents.

Do I need to share my API keys with TypingMind to use Content Core MCP?

No. TypingMind is local-first and lets you keep your model providers, API keys, prompts, and MCP configuration under your control. If Content Core requires authentication, add the required headers, OAuth settings, or local configuration for that MCP server when you create the connection.

Related MCP Servers

View all

Knowledge Graph Memory

modelcontextprotocol

OrganizationPopular

Model Context Protocol Servers

Context7

Context7 Platform -- Up-to-date code documentation for LLMs and AI code editors

Blender

Experience seamless AI-powered 3D modeling by connecting Blender with Claude AI via the Model Context Protocol. BlenderMCP enables two-way communication, allowing you to create, modify, and inspect 3D scenes directly through AI prompts. Control objects, materials, lighting, and execute Python code in Blender effortlessly. Access assets from Poly Haven and generate AI-driven models using Hyper3D Rodin. This integration enhances creative workflows by combining Blender’s robust tools with Claude’s intelligent guidance, making 3D content creation faster, interactive, and more intuitive. Perfect for artists and developers seeking AI-assisted 3D design within Blender’s environment.

Google GenAI Toolbox

MCP Toolbox for Databases is an open source MCP server for databases.

Content Core

Supported Formats

Quick Start

CLI Usage

Extract

Summarize

MCP Server

Configuration

Zero-Install with uvx

Python API

Extraction

Summarization

Configuration

MCP Integration

Claude Code Skill

AI Providers

Configuration

Environment Variables

Proxy Configuration

Optional Dependencies

Using with LangChain

Documentation

Development

License

Contributing

Installation

Available Tools

Use Content Core MCP with multiple AI models

Setup guide to use the local connector

Open the MCP settings

Run the connector command

Add Content Core as a server

Use it across models

Frequently asked questions

Related MCP Servers

Set up your own AI workspace now