Some site have url pattern as www.___.com/id=1
to www.___.com/id=1000
. How can I crawl the site using nutch. Is there any wway to provide seed for fetching in range??
From stackoverflow
-
I think the easiest way would be to have a script to generate your initial list of urls.
0 comments:
Post a Comment