Some site have url pattern as www.___.com/id=1 to www.___.com/id=1000. How can I crawl the site using nutch. Is there any wway to provide seed for fetching in range??
From stackoverflow
-
I think the easiest way would be to have a script to generate your initial list of urls.
0 comments:
Post a Comment