Skip to main content

Release 0.4.17 — Scrapy 2.16 Crawl Fix

Released: 2026-05-22 Chart: oci://ghcr.io/cnoe-io/charts/ai-platform-engineering:0.4.17 Previous release: 0.4.16

Highlights

0.4.17 is a targeted fix for RAG web ingestion. Scrapy 2.16.0 added validation that raised AttributeError whenever a spider defined a singular start_url attribute without a populated start_urls, which broke every crawl mode — single, recursive, and sitemap — resulting in zero pages crawled for any web ingestion request. This release renames the conflicting attribute, adds the new async spider entrypoint, and pins a stable Twisted to resolve a Scrapy 2.16 TLS bug.

What's New

No new features in this release — it is a focused bug-fix release for RAG web crawling.

Bug Fixes

  • rag: restore web crawling under Scrapy 2.16 — renamed the custom self.start_url attribute to self.origin_url across all five affected spiders (scrapy_worker.py, spiders/base.py, spiders/recursive.py, spiders/single_url.py, spiders/sitemap.py) to avoid Scrapy 2.16's new start_url/start_urls validation, added an async start() entrypoint for the new Scrapy API, and pinned twisted==26.4.0 to fix a Scrapy 2.16 TLS bug (#1489)

Breaking Changes

No breaking changes. Drop-in upgrade from 0.4.16.

Known Issues

None known at this time.

Upgrade

helm upgrade ai-platform-engineering \
oci://ghcr.io/cnoe-io/charts/ai-platform-engineering \
--version 0.4.17 \
-f your-values.yaml

Upgrade Guide: 0.4.16 → 0.4.17

Overview

Drop-in upgrade — no values.yaml edits required. 0.4.17 fixes a critical regression in RAG web ingestion caused by the Scrapy 2.16 upgrade. If you use the RAG stack for web crawling (single URL, recursive, or sitemap modes), this upgrade is strongly recommended — earlier builds returned zero crawled pages.

Helm Values Changes

No Helm values changes between 0.4.16 and 0.4.17. Drop-in upgrade — no values.yaml edits required.

Data Migrations

No MongoDB schema or data migrations required.

Upgrade Runbook

1. Update chart version

helm upgrade ai-platform-engineering \
oci://ghcr.io/cnoe-io/charts/ai-platform-engineering \
--version 0.4.17 \
-f your-values.yaml

2. Verify

kubectl get pods -n <namespace>

If you run the RAG stack, confirm a web ingestion crawl now returns pages:

kubectl get pods -n <namespace> | grep rag

Trigger a single-URL crawl and confirm pages are ingested rather than failing immediately at crawl start.