Download Shareware and Freeware Software for Windows, Linux, Macintosh, PDA

line Home  |  About Us  |  Link To Us  |  FAQ  |  Contact

Serving Software Downloads in 956 Categories, Downloaded 50.282.280 Times

HTML Entity Based Codepage Inference 0.01

  Date Added: March 06, 2010  |  Visits: 1.071

HTML Entity Based Codepage Inference

Report Broken Link
Printer Friendly Version


Product Homepage
Download (96 downloads)



HEBCI is a technique that allows a web form handler to transparently detect the character set its data was encoded with. By using carefully-chosen character references, the browsers encoding can be inferred. Thus, it is possible to guarantee that data is in a standard encoding without relying on (often unreliable) webserver/browser encoding interactions. The ideal solution will be entirely browser-neutral and passive. Unfortunately, the HTML spec doesnt define any mechanism for this. We need to find some other, sneakier, way to extract the current character encoding from the browser. Luckily for us, there is a trick we can use for this: entity codes. Entity codes are strings like &, which were (are) used to encode specific characters without using Unicode. When the browser displays a page, it replaces these with the appropriate character from the current encoding. Thus, & becomes the character 0x26 in most codepages. By itself, this is merely implementation trivia. However, this translation process occurs whenever a user submits a form. That is, the browser parses any entities in the form variables and replaces them with the current encodings representation of those characters when the user clicks submit. Thus, any entity codes within the form fields are passed along as character values in the browsers current encoding. So, all we have to do is find an entity that is encoded differently in two different codepages. We slip that into a form field, and then look at its value when we get data. This allows us to differentiate between the two encodings. In fact, we could look at all entities in many codepages, and find the ones that allowed us to disambiguate between many codepages. This is what Ive done. We add hidden form elements with values containing various entity codes, such as °, ÷, and —. Then, when the user submits the form, we take each of those and compare them against a list of what character has what value in what codepage. That is, each codepage has a unique fingerprint for the values of °,÷,—. For MacRoman, its a1,d6,d1; for UTF-8, c2b0,c3b7,e28094. Thus, we only have to go through our table of codepage-to-fingerprint mappings, and see which fingerprint matches. Note that, once this table is discovered, the cost of fingerprinting a given form submission is very low. And, in the case of misses, you can assume whatever your pages default codepage is. This fallthrough case is equivalent to what the code would have done before adding this detection layer..

Requirements: No special requirements
Platforms: Linux
Keyword: Based Character Character Set Codepage Entity Entity Codes Form Form Handler Hebci Html Html Entity Html Entity Based Codepage Inference Inference Its Data Web Form
Users rating: 0/10

License: Freeware Size: 5.12 KB
USER REVIEWS
More Reviews or Write Review


HTML ENTITY BASED CODEPAGE INFERENCE RELATED
Finance  -  Character Set Converter 2.0.0.13
Character Set Converter a conversion tool that was designed to help you convert from one character set to another within text documents. It supports nearly all ISO 8859 character sets, all DOS character sets, most important Apple charactersets...
 
Security Tools  -  HXTT Character Set Converter 1.0
HXTT Character Set Converter is a free toolkit for text file. That freeware should work on any platform that supports Java.
 
Audio Tools  -  RusXMMS 0.2.2
RusXMMS provides character set conversion for languages which can be represented with more than one character set. RusXMMS project originally handled XMMS playlists, but can be useful for any program that works with small pieces of text in...
389.12 KB  
Development Editors  -  HtmlEntities 0.2.3
The HTML Entity Character Lookup was originally developed by Remy Sharp as a HTML + JavaScript (i.e. web app) as well as an OS X dashboard widget. The only possibility for windows users to use that app offline was the HTML Entity Character Lookup...
 
Libraries  -  Unicode::MapUTF8 1.11
Unicode::MapUTF8 is a Perl module with conversions to and from arbitrary character sets and UTF8. SYNOPSIS use Unicode::MapUTF8 qw(to_utf8 from_utf8 utf8_supported_charset); # Convert a string in ISO-8859-1 to UTF8 my $output = to_utf8({...
16.38 KB  
Text Editors  -  Mguesser 0.4
WHAT'S THIS? mguesser is a standalone part of libmnogosearch (a core of mnogo search engine http://www.mnogosearch.org) which allows to guess character set and language of a text file. mguesser is implemented using...
143.36 KB  
Development Editors  -  Quoter 5.1
Converts text with many possible cleanups, including preparation of HTML and Java, aligning in columns, character set conversion, case converting, removing excess white space, removing blank lines, preparing regex expressions.... converts raw text...
1.89 MB  
Specialized Tools  -  Quoter Amanuensis 5.1 B9449
Converts text with many possible cleanups, including preparation of HTML and Java, aligning in columns, character set conversion, case converting, removing excess white space, removing blank lines, preparing regex expressions.... Converts raw text...
1.2 MB  
Libraries  -  Unicode::Map8 0.12
Unicode::Map8 is a mapping table between 8-bit chars and Unicode. SYNOPSIS require Unicode::Map8; my $no_map = Unicode::Map8->new("ISO646-NO") || die; my $l1_map = Unicode::Map8->new("latin1") || die; my $ustr = $no_map->to16("V}re norske...
102.4 KB  
Libraries  -  MIME::WordDecoder 5.420
MIME::WordDecoder is a Perl module to decode RFC-1522 encoded words to a local representation. SYNOPSIS See MIME::Words for the basics of encoded words. See "DESCRIPTION" for how this class works. use MIME::WordDecoder; ### Get the default...
378.88 KB  
NEW DOWNLOADS IN LINUX SOFTWARE, NETWORK & INTERNET
Linux Software  -  EasyEDA PCB Designer for Linux 2.0.0
EasyEDA, a great web based EDA(Electronics Design Automation) tool, online PCB tool, online PCB software for electronics engineers, educators, students, makers and enthusiasts. Theres no need to install any software. Just open EasyEDA in any...
34.4 MB  
Linux Software  -  wpCache® WordPress HTTP Cache 1.9
wpCache® is a high-performance, distributed object, caching system application, generic in nature, but intended for use in speeding up dynamic web applications, by decreasing database load time. wpCache® decreases dramatically the page...
3.51 MB  
Linux Software  -  Polling Autodialer Software 3.4
ICTBroadcast Auto Dialer software has a survey campaign for telephone surveys and polls. This auto dialer software automatically dials a list of numbers and asks them a set of questions that they can respond to, by using their telephone keypad....
488 B  
Linux Software  -  Total Video Converter Mac Free 3.5.5
Total Video Converter Mac Free developed by EffectMatrix Ltd is the official legal version of Total Video Converter which was a globally recognized brand since 2006. Total Video Converter Mac Free is a free but powerful all-in-one video...
17.7 MB  
Linux Software  -  Skeith mod_log_sql Analyzer 2.10beta2
Skeith is a php based front end for analyzing logs for Apache using mod_log_sql.
47.5 KB  
Network & Internet  -  Free WiFi Hotspot 3.3.1
Free WiFi Hotspot is a super easy solution to turn your laptop or notebook into a portable Wi-Fi hotspot, wirelessly sharing your internet connections like DSL, Cable, Bluetooth, Mobile Broadband Card, Dial-Up, etc. through the built-in wireless...
1.04 MB  
Network & Internet  -  Easy Uploads 1.8
Easy uploads is a file storage media streaming application designed by Filestreamers that allows you to upload, store, and stream your files from their virtually unlimited file storage server. Easy Uploads can backup,share, and stream your files...
615.97 KB  
Network & Internet  -  PacketFence ZEN 3.1.0
PacketFence is a fully supported, trusted, Free and Open Source network access control (NAC) system. Boosting an impressive feature set including a captive-portal for registration and remediation, centralized wired and wireless management, 802.1X...
1024 MB  
Network & Internet  -  django-dbstorage 1.3
A Django file storage backend for files in the database.
10.24 KB  
Network & Internet  -  SQL Inject Me 0.4.5
SQL Inject Me is a Firefox extension used to test for SQL Injection vulnerabilities. The tool works by submitting your HTML forms and substituting the form value with strings that are representative of an SQL Injection attack.
133.12 KB