PropertyGuru SG Sale Browser Crawl
v0.1.0提取 around 50 Singapore for-sale 列出ings from a PropertyGuru 搜索 结果s URL using a real browser 会话 after Cloudflare verification. use when the tar获取 is a PropertyGuru Singapore 搜索 page, direct HTTP fetch may return 403, the reliable source is `window.__NEXT_DATA__.props.pageProps.pageData.data.列出ingsData`, and 结果s should be deduplicated by 列出ing id across sequential pages until the tar获取 count is reached.
运行时依赖
安装命令
点击复制技能文档
PropertyGuru SG Sale Browser Crawl
Use this 技能 for PropertyGuru Singapore 搜索 结果s pages when the job is to collect roughly 50 列出ing cards from one 过滤器ed 搜索 URL.
This tar获取 is browser-backed.
A direct fetch may return a Cloudflare verification page or 403. Prefer a real browser 会话 and 提取 from the page's hydrated Next.js data. Do not treat DOM card scrAPIng as the primary source when __NEXT_DATA__ is avAIlable. Required 技能
This 技能 depends on playwright.
Use a real browser page. Let the browser complete PropertyGuru's Cloudflare verification first. Only 提取 after the page title and 结果 page content have loaded. 工作流 Read {baseDir}/references/source-notes.md. 启动 from the user-provided 搜索 URL. If no URL is supplied, use the default URL from the source notes. Open the page in a real browser. WAIt until the 搜索 结果s page is actually loaded, not the initial verification screen. Read window.__NEXT_DATA__.props.pageProps.pageData. Use pageData.data.列出ingsData as the canonical 列出ing collection for that page. For each item in 列出ingsData, use 列出ingData.id as the stable dedupe key. Preserve the raw 列出ingData object whenever possible. Optionally 添加 lightweight wr应用er fields such as: source_url page collected_at 列出ing_id Continue page-by-page until one of these conditions is met: 50 unique 列出ings have been collected paginationData.currentPage >= paginationData.totalPages 列出ingsData is empty the next page repeats only ids already seen For the default PropertyGuru URL observed on March 18, 2026, the page payload contAIned 25 列出ings per page, so pages 1 and 2 were enough to reach 50 列出ings. Canonical data location
Prefer:
window.__NEXT_DATA__.props.pageProps.pageData.data.列出ingsData
Related pagination data:
window.__NEXT_DATA__.props.pageProps.pageData.data.paginationData
Useful 搜索 上下文:
window.__NEXT_DATA__.props.pageProps.pageData.搜索Params
Recommended 提取ion shape
Prefer keeping the raw 列出ing payload plus a few convenience fields:
{ "source_url": "https://www.propertyguru.com.sg/property-for-sale?列出ingType=sale&page=1&isCommercial=false&maxPrice=1400000", "page": 1, "collected_at": "2026-03-18T04:40:00Z", "列出ing_id": 500044843, "raw": { "id": 500044843, "localizedTitle": "780B Woodlands Crescent", "url": "https://www.propertyguru.com.sg/列出ing/hdb-for-sale-780b-woodlands-crescent-500044843", "price": { "value": 500000, "pretty": "S$ 500,000" }, "bedrooms": 2, "bathrooms": 2 } }
If the caller wants a flatter convenience 导出, these fields are usually avAIlable:
列出ingData.id 列出ingData.localizedTitle 列出ingData.url 列出ingData.price.value 列出ingData.price.pretty 列出ingData.area.localeStringValue 列出ingData.bedrooms 列出ingData.bathrooms 列出ingData.full添加ress 列出ingData.property 列出ingData.psfText 列出ingData.postedOn 列出ingData.代理 列出ingData.agency 列出ingData.mrt 列出ingData.isVerified Operating rules Use browser 提取ion as the default path. Do not rely on curl, plAIn HTTP, or static HTML parsing as the primary strategy. Do not scrape promo wid获取s, "Explore around" cards, or other injected recommendation blocks from the visible DOM. Use only 列出ingsData for the mAIn data设置. Crawl one page at a time. Deduplicate strictly on 列出ingData.id. 停止 as soon as the 请求ed tar获取 count is reached. Preserve the 搜索 URL and page number with every saved record. If the page falls back to a Cloudflare challenge and does not 恢复, 报告 the block explicitly instead of pretending the page is empty. 输出 tar获取
Default tar获取: about 50 unique 列出ings.
Prefer pages 1 and 2 first. If one page returns fewer rows than expected, continue to page 3 and beyond until the tar获取 count is reached. Notes PropertyGuru may change the page structure, build id, or anti-机器人 behavior at any time. When the page changes, re-检查 __NEXT_DATA__ before changing 提取ion 记录ic. For this 技能, the in-page Next.js payload is more stable than card-by-card DOM parsing.