Release 0.4.17 — Scrapy 2.16 Crawl Fix
Released: 2026-05-22 Chart:
oci://ghcr.io/cnoe-io/charts/ai-platform-engineering:0.4.17Previous release: 0.4.16
Highlights
0.4.17 is a targeted fix for RAG web ingestion. Scrapy 2.16.0 added validation that raised AttributeError whenever a spider defined a singular start_url attribute without a populated start_urls, which broke every crawl mode — single, recursive, and sitemap — resulting in zero pages crawled for any web ingestion request. This release renames the conflicting attribute, adds the new async spider entrypoint, and pins a stable Twisted to resolve a Scrapy 2.16 TLS bug.
What's New
No new features in this release — it is a focused bug-fix release for RAG web crawling.
Bug Fixes
- rag: restore web crawling under Scrapy 2.16 — renamed the custom
self.start_urlattribute toself.origin_urlacross all five affected spiders (scrapy_worker.py,spiders/base.py,spiders/recursive.py,spiders/single_url.py,spiders/sitemap.py) to avoid Scrapy 2.16's newstart_url/start_urlsvalidation, added an asyncstart()entrypoint for the new Scrapy API, and pinnedtwisted==26.4.0to fix a Scrapy 2.16 TLS bug (#1489)
Breaking Changes
No breaking changes. Drop-in upgrade from 0.4.16.
Known Issues
None known at this time.
Upgrade
helm upgrade ai-platform-engineering \
oci://ghcr.io/cnoe-io/charts/ai-platform-engineering \
--version 0.4.17 \
-f your-values.yaml
Upgrade Guide: 0.4.16 → 0.4.17
Overview
Drop-in upgrade — no values.yaml edits required. 0.4.17 fixes a critical regression in RAG web ingestion caused by the Scrapy 2.16 upgrade. If you use the RAG stack for web crawling (single URL, recursive, or sitemap modes), this upgrade is strongly recommended — earlier builds returned zero crawled pages.
Helm Values Changes
No Helm values changes between 0.4.16 and 0.4.17. Drop-in upgrade — no values.yaml edits required.
Data Migrations
No MongoDB schema or data migrations required.
Upgrade Runbook
1. Update chart version
helm upgrade ai-platform-engineering \
oci://ghcr.io/cnoe-io/charts/ai-platform-engineering \
--version 0.4.17 \
-f your-values.yaml
2. Verify
kubectl get pods -n <namespace>
If you run the RAG stack, confirm a web ingestion crawl now returns pages:
kubectl get pods -n <namespace> | grep rag
Trigger a single-URL crawl and confirm pages are ingested rather than failing immediately at crawl start.
