Domain Name Forum

Domain Name Forum (http://www.domainnameforum.org/index.php)
-   Python (http://www.domainnameforum.org/forumdisplay.php?f=138)
-   -   How to extract the main body(text content) from arbitary webpage? (http://www.domainnameforum.org/showthread.php?t=150348)

Python 03-17-2011 05:10 AM

How to extract the main body(text content) from arbitary webpage?
 
Hi all, In my current project, I need to write python code extracting tons of pages grabbed from the web. By extraction, I mean strip all tags and comments and if possible, filter out small sections like navigation links. The only thing should be left is the length paragraph, if there's any. ...


All times are GMT -4. The time now is 10:13 AM.

Powered by vBulletin®
Copyright ©2000 - 2026, Jelsoft Enterprises Ltd.