Download Shareware and Freeware Software for Windows, Linux, Macintosh, PDA

line Home  |  About Us  |  Link To Us  |  FAQ  |  Contact

Serving Software Downloads in 976 Categories, Downloaded 29.614.704 Times

PyKHTML 0.2

  Date Added: April 24, 2010  |  Visits: 513

PyKHTML

Report Broken Link
Printer Friendly Version


Product Homepage
Download (74 downloads)



PyKHTML is a Python module for writing website scrapers/spiders. Whereas traditional methods focus on writing the code to parse HTML/forms themselves, PyKHTML uses the excellent KHTML engine to do all the trudge work. It therefore handles webpages very well (even the severely crufty ones) and is pretty darn fast (implemented in C++). As a bonus the module handles JavaScript and cookies transparently. How? PyKHTML requires PyKDE 3 (and hence in turn PyQt 3 + KDE libs). If you would like to run PyKHTML on servers without an X display then Xvfb is required. Fortunately these requirements should come bundled with most modern Linux distributions, and support for Windows/Mac should appear in the next few months. Show me some code Okay. Here is an example (one of many examples included in the bundle) that scrapes the title and navigation from this page, with excessive commenting to give you a feel of what programming with PyKHTML is like: import pykhtml PyKHTMLUrl = "http://paul.giannaros.org/pykhtml" def extractBitsFromPage(browser): # getElementsByTagName returns a generator, so we convert # to a list and access the first element title = list(browser.document.getElementsByTagName("title"))[0] print "Title:", title.text # Get the text of the navigation items navigation = [] # First get the container of the list items... navigationElement = browser.document.getElementById("navigation") # ... and then loop over the li elements we find for listItem in navigationElement.getElementsByTagName("li"): # Inside the list item is an anchor anchor = listItem.children[0] # And the text inside the anchor is what we want navigation.append(anchor.text) print "Navigation:", " | ".join(navigation) # Stop here, were done pykhtml.stopEventLoop() def main(): browser = pykhtml.Browser() # the browser is passed as a parameter to extractBitsFromPage # when it is called (when the page has loaded) browser.load(PyKHTMLUrl, extractBitsFromPage) # kick things off pykhtml.startEventLoop() main().

Requirements: No special requirements
Platforms: Linux
Keyword: Http Internet List Module Navigation Pykhtml Python Website Writing
Users rating: 0/10

License: Freeware Size: 26.62 KB
USER REVIEWS
More Reviews or Write Review


PYKHTML RELATED
Network & Internet  -  cdServer 0.8
cdServer is a simple HTTP server based on the standard Python library module SimpleHTTPServer. cdServer is designed to serve (static) contents off a CD-ROM. cdServer provides a simple interface for special (interactive) functions implemented in an...
84.99 KB  
Network & Internet  -  Python-CDDB 0.1.4
Python-CDDB (PyCDDB) is a module written in Python to access a CDDB-server and get information about discs like: artist, disc-title, track titles and more.
215.04 KB  
Modules  -  rwhois.py 1.5
This is a recursive whois module/client for python. It provides your typical whois lookup and the ability to parse records into usable objects.
 
Programming  -  StepSim for Linux 0.5.3
StepSim is a lightweight step-based simulation module written in Python. It can do simple real-time simulations of discrete systems. StepSim supports step-by-step simulation or can run until a break cond?*A* Stepsim is more deterministic now:...
30.72 KB  
E-Mail Tools  -  Python milter 0.8.7
Python milter module provides a python interface to Sendmails libmilter that exploits all its features. Milters can run on the same machine as sendmail, or another machine. The milter can even run with a different operating system or processor...
102.4 KB  
Utilities  -  WSGIUserAgentMobile 0.2.12
Mobile user agent string parser for WSGI applications WSGIUserAgentMobile is HTTP mobile user agent string parser Python module. It'll be useful in parsing HTTP_USER_AGENT strings of (mainly Japanese) mobile devices. This...
30.72 KB  
Networking  -  Navigation Du Lapin Blanc 1.0.3
This plugin provides integrated navigation for your website. Thus you can use WordPress as a CMS for your website and think in terms of main navigation, sub navigation etc. A navigation item can link to page, a category, directly to the first sub...
40.96 KB  
Audio Tools  -  pymad 0.5.4
pymad is a Python module that allows Python programs to use the MPEG Audio Decoder library. pymad provides a high-level API, similar to the pyogg module, which makes reading PCM data from MPEG audio streams a piece of cake. Using pymad is as...
21.5 KB  
Modules  -  Freestyle FAQ Lite 1.5.6
Freestyle FAQ Lite is designed to provide you with a highly customised Frequently Asked Questions (FAQs) module on your Joomla website.There are various customisable options, you can display FAQs under a menu item or within a module.d-deD?...
174.08 KB  
Modules  -  Indymedia cities list 6.x-1.3
The Indymedia cities list module generates an up-to-date list of indymedia sites. For PHP 4-based sites, the list is updated from an HTML version. For PHP 5-based sites, the list is updated from an XML version, and the listing is fully...
10 KB  
NEW DOWNLOADS IN LINUX SOFTWARE, NETWORK & INTERNET
Linux Software  -  Polling Autodialer Software 3.4
ICTBroadcast Auto Dialer software has a survey campaign for telephone surveys and polls. This auto dialer software automatically dials a list of numbers and asks them a set of questions that they can respond to, by using their telephone keypad....
488 B  
Linux Software  -  Total Video Converter Mac Free 3.5.5
Total Video Converter Mac Free developed by EffectMatrix Ltd is the official legal version of Total Video Converter which was a globally recognized brand since 2006. Total Video Converter Mac Free is a free but powerful all-in-one video...
17.7 MB  
Linux Software  -  Skeith mod_log_sql Analyzer 2.10beta2
Skeith is a php based front end for analyzing logs for Apache using mod_log_sql.
47.5 KB  
Linux Software  -  SLAX 6.0+
Slax is a modern, portable, small and fast Linux operating system with a modular approach and outstanding design. Despite its small size, Slax provides a wide collection of pre-installed software for daily use, including a well organized graphical...
190 KB  
Linux Software  -  GTK+ 2.5
GTK+, which stands for the GIMP Toolkit, is a library for creating graphical user interfaces for the X Window System. It is designed to be small, efficient, and flexible. GTK+ is written in C with a very object-oriented approach. Language bindings...
60 MB  
Network & Internet  -  Free WiFi Hotspot 3.3.1
Free WiFi Hotspot is a super easy solution to turn your laptop or notebook into a portable Wi-Fi hotspot, wirelessly sharing your internet connections like DSL, Cable, Bluetooth, Mobile Broadband Card, Dial-Up, etc. through the built-in wireless...
1.04 MB  
Network & Internet  -  Easy Uploads 1.8
Easy uploads is a file storage media streaming application designed by Filestreamers that allows you to upload, store, and stream your files from their virtually unlimited file storage server. Easy Uploads can backup,share, and stream your files...
615.97 KB  
Network & Internet  -  IPv6 CARE 3.2b
IPv6 CARE, "IPv6 Compliant Automatic Runtime Environment", is a Linux tool able to patch ipv6-agnostic programs on-the-fly ('patch' mode). It can also generate a diagnosis about the IPv6 compliance of an application ('check' mode).
409.6 KB  
Network & Internet  -  PacketFence ZEN 3.1.0
PacketFence is a fully supported, trusted, Free and Open Source network access control (NAC) system. Boosting an impressive feature set including a captive-portal for registration and remediation, centralized wired and wireless management, 802.1X...
1024 MB  
Network & Internet  -  django-dbstorage 1.3
A Django file storage backend for files in the database.
10.24 KB