One-V LLM Serve

Description

One-V LLM Serve makes every public page on your WordPress site available as clean Markdown at the same URL with a .md extension — zero configuration required.

https://example.com/about/       HTML page for humans
https://example.com/about.md     clean Markdown for AI

AI systems — ChatGPT, Perplexity, ClaudeBot, Google AI Overviews, and most RAG pipelines — parse Markdown far more efficiently than HTML. When these systems encounter an HTML page, they must strip navigation, headers, footers, sidebars, scripts, and tracking pixels before they can read the actual content. This noise introduces errors, increases token cost, and leads to lower-quality outputs.

The Markdown file contains a configurable YAML frontmatter block followed by the page title, headings in correct hierarchy, and the body text. Nothing else.

Core features

  • Zero-config Markdown endpoint for every public post, page, and custom post type
  • YAML frontmatter with configurable fields (title, date, modified, url, description, image, tags, categories, lang, type)
  • /llms.txt discovery file at the site root following the llmstxt.org convention
  • Taxonomy archives as Markdown — /category/news.md, /tag/foo.md, custom taxonomies
  • ?format=markdown query parameter as an alternative to the .md URL on any singular page
  • Per-post exclude via a sidebar checkbox on the post editor
  • Works with Classic Editor and Gutenberg via the the_content filter
  • ACF integration — opt-in per-post: pick which text, textarea, WYSIWYG, URL, email, or link fields to append below the body
  • Filterable AI analytics — per-hit events with full denormalised dimensions (UA bucket, referrer host, language, post type, response code), sticky filter bar that drives every chart and table live, six KPI tiles, a stacked-area time chart, three composition donuts (UA bucket / referrer source / language), four Top tables, a User-Agent classifier transparency table, and a Recent Activity stream. Referrers are tracked by hostname only — paths and query strings are stripped before storage so no PII is retained. Forward-compatible classification: when the bot or referrer catalogue is updated in a future release, historical rows are reclassified automatically — no Reset Analytics required.
  • Browser-bucket sub-classification — anything that looks like a browser visit gets split into four kinds based on the Sec-Fetch-Site, Sec-Fetch-User, and Sec-CH-UA request headers a real browser sends: verified user (top-level navigation triggered by a click or address-bar Enter in a recognised browser), headed agent (real Chromium driven programmatically — Playwright, Puppeteer, Selenium), script agent (bare HTTP client imitating a browser UA — requests, httpx, LangChain, custom agents), spoofer (UA shape that no real browser would emit, like modern Chrome with a non-reduced UA). Visible as a stacked-bar breakdown on the User-Agents subpage so you can see at a glance how much of your “human” traffic is actually automation, and rendered inline as colour-coded slugs on every browser-bucket row in the Recent Activity table on the Analytics page. Detection is server-side fingerprinting of the request itself — no cookies, no JS, no IP.

Discoverability

  • Link: rel="alternate"; type="text/markdown" HTTP header on every HTML page
  • <link rel="alternate"> tag in <head> for HTML-based discovery
  • Allow: /*.md$ directive in robots.txt
  • CORS Access-Control-Allow-Origin: * on .md and /llms.txt so browser-based AI clients can fetch them

Operations

  • Transient caching with automatic invalidation on save_post, on ACF field value saves, on any ACF field group change, and on plugin settings save
  • “Clear cache” button in the settings page
  • Admin notice on fallback HTTP fetch failures
  • “Settings” link next to the plugin row in Plugins screen
  • “View .md” row action in the Posts and Pages list tables

Developer hooks

  • ovls_markdown filter for the final Markdown output
  • ovls_frontmatter filter for adding, removing, or modifying frontmatter fields
  • ovls_content_queries filter for the HTML extraction XPath cascade

How it works

Each request to /about.md is captured by a WordPress rewrite rule and routed through the plugin’s content generator. The generator runs the post through apply_filters( 'the_content', ... ) — the same pipeline WordPress uses on the front end — so Classic Editor, Gutenberg, and shortcodes all work without separate code paths. The rendered HTML is converted to Markdown via league/html-to-markdown, then cached in a WordPress transient.

The cache is invalidated automatically on save_post, on ACF field/group changes, and whenever plugin settings are saved. A manual Clear cache button is also available on the settings page.

Access methods

There are three equivalent ways to request the Markdown version of a page:

  • .md extension — https://example.com/about.md
  • ?format=markdown query — https://example.com/about/?format=markdown
  • Link: rel="alternate" header — returned by every HTML page

The .md URL is the recommended canonical form.

ACF integration

When Advanced Custom Fields is active, ACF field rendering is opt-in at two levels:

  1. Site defaults per post type — at Settings One-V LLM Serve ACF Defaults, tick fields that should be appended to every post of a given post type.
  2. Per-post override — the One-V LLM Serve metabox on each post editor lists every supported ACF field applicable to that post. Tick fields to replace the site defaults for that one post.

Supported ACF types: text, textarea, wysiwyg, url, email, link. Each selected field is rendered under a ## Field Label heading. Empty fields are skipped.

Disclaimer

This plugin is provided “as is”, without warranty of any kind, express or implied, in accordance with the GNU General Public License v2 or later. The authors and contributors are not liable for any direct, indirect, incidental, special, or consequential damages — including but not limited to data loss, lost profits, business interruption, search-ranking changes, or third-party claims — arising from the use of, or inability to use, this software, even if advised of the possibility of such damages.

By installing and activating the plugin you acknowledge that:

  • You are responsible for testing the plugin in a staging environment before deploying to production.
  • You are responsible for the content this plugin exposes as Markdown — .md URLs and /llms.txt serve the same content as their HTML counterparts and are intended to be crawled and consumed by AI systems and third-party LLMs.
  • The plugin does not transmit data to any external service. All Markdown generation, caching, and file writes happen on your own server.

Nothing in this disclaimer is intended to exclude or limit liability for matters that cannot lawfully be excluded under the consumer-protection laws of your jurisdiction. For the full legal terms see the GPLv2 license at https://www.gnu.org/licenses/gpl-2.0.html.

Screenshots

  • Settings page — master toggle, llms.txt and taxonomy toggles, frontmatter field picker, ACF Defaults, and post-type selection.
  • Rendered Markdown output with YAML frontmatter served at the .md URL.
  • Another rendered example showing a different page as Markdown.

Installation

  1. Upload the one-v-llm-serve folder to /wp-content/plugins/, or install via Plugins Add New Upload Plugin.
  2. Activate the plugin through the Plugins screen in WordPress.
  3. Visit Settings One-V LLM Serve to configure post types, frontmatter fields, and ACF defaults.

Rewrite rules are flushed automatically on activation. If .md URLs return 404 immediately after activation, go to Settings Permalinks and click Save Changes.

FAQ

Does activating the plugin change my existing pages?

No. The plugin only responds to .md URLs, /llms.txt, and the ?format=markdown query parameter. All existing HTML URLs are unaffected.

Will the `.md` URLs hurt my SEO?

No. .md responses are served with X-Robots-Tag: noindex, follow, so search engines do not index the Markdown variants and the canonical HTML page remains the sole entry in Google/Bing/etc. The Link: rel="alternate"; type="text/markdown" header on each HTML page advertises the Markdown alternate to AI consumers without exposing it to SERPs.

Does it work with password-protected posts?

No. Password-protected and private posts return 404 on the .md URL. Only published posts are served.

What Markdown flavour is used?

CommonMark-compatible Markdown via league/html-to-markdown. ATX-style headings (#), inline links ([text](url)), and fenced code blocks.

Where is the Markdown cached?

In WordPress transients (database by default, or your object cache). Entries are invalidated when the post is saved, when ACF fields or settings change, or when you click Clear cache — a long safety-net expiry also lets any orphaned entry clear itself on object-cache setups.

The `.md` URL returns 404 after activation.

Go to Settings Permalinks and click Save Changes to flush rewrite rules.

Can I disable Markdown for specific posts?

Yes. Two ways:

  1. Check Exclude from Markdown in the One-V LLM Serve metabox on the post editor.
  2. Return '' from an ovls_markdown filter callback.

Does it work with page builders like Elementor or Divi?

Yes. Any builder that hooks into the_content is supported (Elementor, Divi, WPBakery, Beaver Builder). For builders that bypass the_content, the plugin falls back to fetching the rendered frontend HTML and extracting the main content area.

Is it compatible with caching plugins?

Yes. Markdown is stored in WordPress transients. Object caches (Redis, Memcached) work transparently — and Clear cache correctly invalidates them, not just the database. Full-page caching layers (WP Rocket, W3 Total Cache, LiteSpeed Cache) serve fresh Markdown on the next request after a save.

/llms.txt returns 404 on WPEngine / Kinsta / managed nginx hosts

Some managed WordPress hosts configure their nginx to serve static file extensions (.txt, .xml, …) directly from disk without passing the request to WordPress. When the file is generated dynamically by a plugin, that produces a 404 because nothing exists on disk.

Fix: enable Settings One-V LLM Serve Write llms.txt to disk. The plugin then maintains a real /llms.txt file at the site root, regenerating it on every post save, ACF change, or settings update. The file carries a marker comment on the first line; the plugin refuses to overwrite a /llms.txt it did not create. On plugin deletion the managed file is removed via uninstall.php.

Does disk-mode work on WordPress installed in a subdirectory or on multisite?

Not currently. The disk-mode writer assumes WordPress is installed at the site root (ABSPATH is the public root). Subdirectory installs (/wp/) and multisite are not supported by disk-mode yet — for those, the dynamic rewrite-rule path is still available on hosts where nginx forwards .txt requests to PHP (most hosts other than WPEngine/Kinsta).

My .md endpoint returns the same X-Robots-Tag to every bot. Why?

The plugin sends a User-Agent-conditional X-Robots-Tag: AI crawlers (GPTBot, ClaudeBot, PerplexityBot, …) get index, follow so they will use the Markdown variant, while traditional search engines (Bingbot, Googlebot, …) get noindex, follow so the canonical HTML page remains the sole entry in SERPs. To keep shared caches from collapsing the two variants into one, the response carries Vary: User-Agent and Cache-Control: private, max-age=0, must-revalidate.

These signals work on standard WordPress hosting. They can be overridden by edge layers in specific hosting / CDN configurations:

  • Some managed WordPress hosts (Kinsta, WPEngine, Pressable, SiteGround, and others) ship default edge caching that treats static-looking file extensions like .md as long-lived static assets and rewrites the plugin’s Cache-Control header to a public, long-max-age value.
  • Some CDNs (most notably Cloudflare on its default cache key) ignore Vary: User-Agent entirely — they cache one variant per URL and serve it to every visitor regardless of UA.

When one of these is in front of your site, the first .md request to reach the edge caches the response for everyone afterwards. The plugin is still doing the right thing at origin, but visitors only ever see the cached copy.

Diagnosing it. Open a terminal and run:

curl -skI -A "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" https://example.com/your-page.md

Headers to look for:

  • cf-cache-status: HIT (Cloudflare), x-kinsta-cache: HIT, x-cache: HIT (generic) — the response is coming from the edge cache.
  • age: <large number> — the response has been sitting in cache for that many seconds.
  • cache-control: public, max-age=<large> instead of the plugin’s private, max-age=0 — your host or CDN has rewritten it.

Then add a cache-busting query string and try again:

curl -skI -A "Mozilla/5.0 (compatible; bingbot/2.0; …)" "https://example.com/your-page.md?cb=12345"

If the headers are now correct (x-robots-tag: noindex, follow for Bingbot, index, follow for an AI UA), the plugin is fine — the edge is the source of the problem.

Fixing it. The fix has to be applied at the layer that is caching, not in WordPress. Common remedies:

  • On the CDN, exclude *.md URLs from any “Cache Everything” rule, or add a Bypass Cache rule for them.
  • On managed hosts, contact support and ask them to exempt *.md from the host’s edge cache (so the plugin’s Cache-Control: private is honoured).
  • If neither is available, enable Allow search engines to index .md (advanced) at One-V LLM Serve Settings — the plugin then sends index, follow to every UA. The behavior is consistent at the cost of allowing search engines to index the Markdown variants alongside the HTML pages.

What does the AI bot analytics feature collect?

For each .md request the plugin stores: the timestamp, the User-Agent string (deduplicated via a small dictionary table — one row per unique UA), the referrer hostname only (path and query string are stripped), the requested post or term and its post type / taxonomy, the post language, and the HTTP response code (200 / 304 / 404).

Never stored: IP addresses, cookies, session identifiers, geolocation, full referrer URLs with query strings, or any user-account data. Counts visits exclusively to .md URLs — the regular HTML pages are not tracked.

Detailed per-hit events are kept for the number of days you choose in Settings (default: 365). Older events can optionally be rolled up into a daily aggregate table for long-term trend charts — the aggregate retains only the bucket / language / referrer-bucket / response-code dimensions, no per-UA or per-post detail. Daily WP-cron ovls_events_cleanup enforces the retention.

Analytics is enabled by default and can be turned off at One-V LLM Serve Settings. All stored data can be wiped at One-V LLM Serve Analytics Reset analytics. Uninstalling the plugin drops the three analytics tables (ovls_events, ovls_ua_dict, ovls_events_archive) completely.

Why doesn’t my analytics show every AI bot that visits?

The plugin classifies crawlers by their User-Agent header. Almost all major AI companies (OpenAI, Anthropic, Perplexity, Google, Apple, Amazon, Meta, ByteDance, Cohere, Mistral, Common Crawl, …) honestly self-identify because they have publicly committed to respecting robots.txt. Those are detected accurately.

However, some traffic is invisible to User-Agent-based detection:

  • Stealth crawlers that spoof a regular browser User-Agent (some training-data brokers, certain less-ethical scrapers).
  • Agentic browsers like OpenAI Operator or Claude/Claude-Browse running an actual headless Chrome — technically indistinguishable from a human visit at the header level.
  • AI assistants that ingest a page through an integrated third-party fetcher (e.g. a python-requests script) — these show up under “Other bot” rather than the underlying model.

For these the plugin records what it can — they will appear in the “Other bot” bucket or the “Browser” bucket with a sub-classification that flags the request as a Playwright-class headed agent, a script agent, or a UA-shape spoofer rather than a real user. The plugin’s bot signature list is updated each release as new identifiers are publicly documented.

How do you tell a real human from a scraper that imitates a browser User-Agent?

By looking at request headers that real browsers emit automatically and bare HTTP clients usually don’t. Specifically:

  • Sec-Fetch-User: ?1 — present only when the navigation was triggered by user activation (link click, address-bar Enter, tap-out from an AI app to the system browser). Programmatic navigation in Playwright/Puppeteer doesn’t set it.
  • Sec-CH-UA brand list — a real downstream browser (Chrome, Edge, Brave, Opera, Vivaldi, Yandex, Samsung Internet, Arc, DuckDuckGo) announces its brand here. The open-source Chromium build that Playwright runs by default does not — it identifies as bare "Chromium". Detectable difference.
  • Sec-Fetch-Site — sent by every modern browser since Safari 16.4 (March 2023). Absence indicates the request didn’t come from a browser engine at all.
  • User-Agent shape — Chrome ≥ 110 must report Chrome/X.0.0.0 (User-Agent Reduction); a UA claiming Chrome/133.0.6943.141 with non-zero minor digits is impossible for a real browser and flags the request as a copy-pasted scraper UA.

Combined, these four signals split the “Browser” bucket into verified user, headed agent (real Chromium under automation), script agent (curl/httpx/requests imitating a UA), and spoofer (impossible UA shape). None of this requires JavaScript or cookies — it’s all server-side inspection of the HTTP request the client already sent.

A motivated scraper can manually set these headers to bypass detection. The classifier doesn’t claim to catch every bot ever — it catches default-configuration tools, which is the vast majority.

Is the analytics feature GDPR-compliant?

The events store no personal data — no IP addresses, no user identifiers, no cookies, no full referrer URLs (the path and query string are stripped before storage, so utm parameters or any PII that might be encoded in a URL never reach the database). User-Agent strings, Fetch Metadata Request Headers (Sec-Fetch-Site, Sec-Fetch-User), and User-Agent Client Hints (Sec-CH-UA) are technical request-level metadata used to classify automated crawlers and tell browser-class clients apart from script-class clients — they carry strictly less information than the User-Agent and don’t, on their own or in combination, identify a person. The plugin relies on legitimate-interest (Art. 6(1)(f)) as the lawful basis — server-side bot analytics is widely accepted as a legitimate interest, and the suggested privacy text at Tools Privacy discloses precisely what is collected so it can be copied into your site policy.

If your jurisdiction requires explicit user disclosure for any server-side analytics, you can turn the feature off at One-V LLM Serve Settings.

Reviews

ماي 29, 2026
This plugin made my company website easy for AI to read. It turns pages into clean Markdown and it works really well. I don’t usually expect something like this for free, but it is well made and fits this new approach perfectly.
ماي 22, 2026
The Markdown versions of pages are generated perfectly, and after implementation I noticed a clear improvement in content indexing and visibility for AI systems. A very useful solution for modern SEO and LLM optimization for WordPress websites.
ماي 22, 2026
Huge thanks to the One-V LLM Serve team for such a thoughtful plugin. Zero config, clean Markdown output, ACF support — everything just works. Exactly what WordPress sites need to stay relevant in the age of AI.
ماي 21, 2026
Excellent plugin for making WordPress content AI-friendly. The Markdown output is clean, lightweight, and works exactly as described. Setup is simple, performance is solid, and the ACF integration is surprisingly useful. A very smart solution for modern AI indexing and RAG workflows.
Read all 7 reviews

Contributors & Developers

“One-V LLM Serve” is open source software. The following people have contributed to this plugin.

Contributors

Translate “One-V LLM Serve” into your language.

Interested in development?

Browse the code, check out the SVN repository, or subscribe to the development log by RSS.

Changelog

1.1.0

Major feature release: AI-traffic analytics and stronger crawler-facing HTTP semantics. Highlights:

  • Added: AI traffic analytics — a new top-level “One-V LLM Serve” admin menu with an Analytics subpage and a WP-Admin dashboard widget. Per-hit events record the UA bucket (AI / search / other-bot / browser / unknown), referrer source, language, target post or term, and response code, with a sticky filter bar that drives every chart and table live, KPI tiles, a time chart, composition donuts, Top tables, a User-Agent classifier transparency table, and a Recent Activity stream. Referrers are stored by hostname only — no IPs, cookies, or full URLs are ever recorded.
  • Added: Browser-bucket sub-classification — traffic that looks like a browser is split into verified user / headed agent (Playwright-class automation) / script agent (curl, httpx, requests, LangChain) / spoofer, using server-side fingerprinting of the Sec-Fetch and Sec-CH-UA request headers. No JavaScript, no cookies, no IP.
  • Added: Referrer attribution with a six-bucket catalogue (search / chatbot / social / direct / internal / other). The “chatbot” bucket surfaces when an LLM cited your page in an answer and the reader clicked through.
  • Added: Forward-compatible classification — when the bot or referrer catalogue is updated in a future release, historical rows are reclassified automatically; no Reset Analytics required. wp ovls reclassify runs the same pass on demand.
  • Changed: User-Agent-conditional X-Robots-Tag — known AI crawlers receive index, follow so they ingest the Markdown variant, while search engines receive noindex, follow so the canonical HTML page stays the sole SERP entry. A new “Allow search engines to index .md (advanced)” toggle opts every User-Agent into index, follow.
  • Added: Conditional GET (ETag / If-None-Match and Last-Modified / If-Modified-Since) returning 304 without a body, plus X-Content-Type-Options: nosniff, Referrer-Policy: no-referrer, Vary: User-Agent, Accept-Encoding, and Cache-Control: private, max-age=0, must-revalidate.
  • Added: X-OVLS-Version HTTP header on every .md response and in the /llms.txt marker, so operators can confirm the deployed build with a single curl -I.
  • Added: WP-CLI namespace wp ovls …flush, regenerate, list, warm, and reclassify.
  • Added: Multilingual slug-fallback for WPML and Polylang so language-prefixed .md URLs resolve to the correct translation instead of the default-language post.
  • Fixed: Unaddressable permalinks — off-host “Page Links To” links, non-viewable custom post types, and missing permalinks are now skipped everywhere a .md link is built and 404 on direct request, instead of polluting /llms.txt.
  • Fixed: Clearing the cache now reliably invalidates entries on sites running a persistent object cache (Redis / Memcached), and cached Markdown no longer loads into memory on every request.
  • Changed: The admin menu moved to its own top-level “One-V LLM Serve” entry (Settings + Analytics). The settings page slug is unchanged, so existing deep-links keep working.
  • Added: Edge-cache guidance for hosts and CDNs (Kinsta, Cloudflare) that cache .md across User-Agents and defeat the conditional headers; Kinsta installs get an in-dashboard notice.
  • Added: /llms.txt now carries the “Generated by One-V LLM Serve” marker in both dynamic and disk-mode delivery, and lists posts of every language on multilingual sites.
  • Hardening: PHP 8.0 runtime guard, a generator lock plus 30-second wall-clock cap against bot storms, 508 loop detection, dbDelta-failure admin notices, a /llms.txt size cap, self-healing rewrite flush on upgrade, and safer HTML/YAML handling.
  • Added: Suggested Privacy Policy text at Tools Privacy describing exactly what the analytics feature collects and how to opt out.

1.0.2

  • Added: rel=”canonical” Link HTTP header on .md responses pointing to the HTML permalink — consolidates SEO signals and avoids duplicate-content indexing. Index.md points at the homepage; taxonomy term .md points at the term archive.
  • Added: Disclaimer section in readme covering warranty, liability, and data-transmission stance per GPLv2.

1.0.1

  • Fixed: disk-mode for /llms.txt now detects multisite and “WordPress in a subdirectory” installs and refuses to write to ABSPATH when it does not map to the public docroot. Settings page surfaces a clear “unsupported install layout” state instead of silently writing the file to the wrong location. The dynamic rewrite-rule path keeps working on all install layouts.
  • Fixed: uninstall script applies the same layout check before attempting to delete the managed /llms.txt.

1.0.0

  • Initial release.