Tuesday, April 5, 2011

Nutch crawling with seeds urls are in range

Some site have url pattern as www.___.com/id=1 to www.___.com/id=1000. How can I crawl the site using nutch. Is there any wway to provide seed for fetching in range??

From stackoverflow
  • I think the easiest way would be to have a script to generate your initial list of urls.

0 comments:

Post a Comment