Monday, October 20, 2003

The xch perl script (posted on Oct 11) Xtracts URLs from a file and CHallenges the proxy server to determine which URLs are blocked.

xch source_file out_file

will extract URLs from source_file (which is an html file of links to other sites) and use curl to retrieve the http header of the URL. If the header corresponds to Error 403 Forbidden, that (usually) means that URL is on the SBA blacklist. The URL will then be written to out_file.

As I mentioned earlier, this algorithm stopped working sometime between 3Q last year and this year. SCV's proxy server now returns a header that says everything is OK, but the server transparently gives you back a HTML document that says the URL has been blocked by the caching server.

Maybe I can continue to just retrieve headers and rely on the "Content-Length: 1090" string in the header to act as a marker for blacklisted sites.

If you just want to extract URLs from a file but not challenge the proxy server, use

xch -x source_file challenge_list

This puts all the extracted URLs into challenge_list

If you already have a list of URLs you want to test, use

xch -c challenge_list out_file

where challenge_list is the list of URLs in a text file, one URL per line.

Saturday, October 18, 2003

The list I posted earlier (October 12, 2003) was compiled around 3Q 2002 but when I checked the list recently, I discovered that some of the sites that used to be blocked were now unblocked. SCV also changed its proxy server so my old xch script didn't work any more.

Specifically, before I could just check the HTTP-header to see whether a file was blocked (Error 403 Forbidden). Now, the header that comes back is OK, but the server actually sends back a page that says the URL has been blocked. That means that we have to ask curl to retrieve the whole web page before we can tell if it has been blacklisted. Consumes a lot more bandwidth.

Anyway, a lot of the sites on the old blacklist are not blacklisted anymore. Only 26 out of the original 59 sites are still blocked. Reason unknown. No obvious pattern for what's in or out.

Rot13'd as usual (See Oct12 post for decoders) :

-------
uggc://jjj.4nqhygfbayl.pbz
uggc://jjj.nqhygf-bayvar.pbz
uggc://jjj.nfvnacyrnfherf.pbz
uggc://jjj.obbglpnzc.pbz
uggc://jjj.pnsrsyrfu.pbz
uggc://jjj.pbzchgrefrk.pbz
uggc://jjj.qnavfurebgvpn.pbz
uggc://jjj.robalsnagnfl.pbz
uggc://jjj.srgvfuubgry.pbz
uggc://jjj.serrurnira.pbz
uggc://jjj.cragubhfr.pbz
uggc://jjj.cragubhfryvir.pbz
uggc://jjj.cragubhfrznt.pbz
uggc://jjj.cvaxpubpbyngr.pbz
uggc://jjj.cynlobl.pbz
uggc://jjj.cbeagi.pbz
uggc://jjj.chffl.pbz
uggc://jjj.chfflynaq.pbz
uggc://jjj.frk.pbz
uggc://jjj.frkrqvgvba.pbz
uggc://jjj.fvzcylnfvna.pbz
uggc://jjj.fbncl.pbz
uggc://jjj.hasnvgushy.pbz
uggc://jjj.jvyqpureelf.pbz
uggc://jjj.jbeyqfrk.pbz
----------

Are any of the sites on my challenge list now blocked that were not blocked before ? Don't know. I'm reluctanct to re-run the test completely because the change in SCV proxy's response means that I would have to actually retrieve entire web pages instead of just headers a few thousand times.

Did find a few new sites by doing variations of the original 59-site list. Just tacked on a variety of country-code TLDs and challenged the proxy again, e.g. www.playboy.com becomes www.playboy.com.hk, .tw, .au, .co.jp etc.

This gave a few more sites:
-------
uggc://jjj.4nqhygfbayl.pbz.nh
uggc://jjj.nqhygynaq.pbz.nh
uggc://jjj.ploretveyf.pbz.nh
uggc://jjj.ubgfrk.pbz.nh
uggc://jjj.chffl.pbz.nh

uggc://jjj.cynlobl.pbz.gj
--------

(.nh = .au and .gj = .tw)


Thursday, October 16, 2003

ARTFL Project: Webster Dictionary, 1913:

"Hack, a. Hackneyed; hired; mercenary. Wakefield.

Hack writer, a hack; one who writes for hire. A vulgar hack writer.' Macaulay."

So what exactly is a hack ?

A Hack's Progress:

"Hack:
A common drudge; especially a literary drudge; hence a poor writer, a mere scribbler.

Hack:
A half-breed horse with more bone and substance than a thoroughbred..

Hack (verb):
To make rough cuts, to deal cutting blows."

Sunday, October 12, 2003

Rot13'd to discourage attention from robots and such.

Cut and paste the list into ROT13 JavaScript coder/decoder if you don't have another decoder.

Or use the perl script

#!/usr/bin/perl -p
y/A-Za-z/N-ZA-Mn-za-m/;

shamelessly stolen from http://www.miranda.org/~jkominek/rot13/

This is the list as compiled in late 2002.

-------------------------------
uggc://jjj.4nqhygfbayl.pbz
uggc://jjj.nqhygynaq.pbz
uggc://jjj.nqhygf-bayvar.pbz
uggc://jjj.nyyfrklzra.pbz
uggc://jjj.nyygrraf.pbz
uggc://jjj.nfvnaahqrf.pbz
uggc://jjj.nfvnacyrnfherf.pbz
uggc://jjj.onqtvey.pbza
uggc://jjj.obbglpnzc.pbz
uggc://jjj.oernfgf.pbz
uggc://jjj.pnsrsyrfu.pbz
uggc://jjj.pnebypbk.pbz
uggc://jjj.pryroahqr.pbz
uggc://jjj.pbzchgrefrk.pbz
uggc://jjj.ploretveyf.pbz
uggc://jjj.qnavfurebgvpn.pbz
uggc://jjj.robalsnagnfl.pbz
uggc://jjj.rebgvpnkkk.pbz
uggc://jjj.srgvfuubgry.pbz
uggc://jjj.serrqnvylcbea.pbz
uggc://jjj.serrurnira.pbz
uggc://jjj.serrcvpgherftnyyrel.pbz
uggc://jjj.tveylfrkcvpf.pbz
uggc://jjj.ubgfrk.pbz
uggc://jjj.ubgfrkgrraf.pbz
uggc://jjj.ubggrra.pbz
uggc://jjj.vgrraf.pbz
uggc://jjj.zrfflphzfubgf.pbz
uggc://jjj.zgerrkkk.arg
uggc://jjj.anxrqtveyf.pbz
uggc://jjj.cragubhfr.pbz
uggc://jjj.cragubhfryvir.pbz
uggc://jjj.cragubhfrznt.pbz
uggc://jjj.crefvnaxvggl.pbz
uggc://jjj.cvpghercbfg.pbz
uggc://jjj.cvpjnerubhfr.pbz
uggc://jjj.cvaxpubpbyngr.pbz
uggc://jjj.cynlobl.pbz
uggc://jjj.cbeaab.pbz
uggc://jjj.cbeagi.pbz
uggc://jjj.chffl.pbz
uggc://jjj.chfflynaq.pbz
uggc://jjj.frk.pbz
uggc://jjj.frkrqvgvba.pbz
uggc://jjj.frkarg.pbz
uggc://jjj.frkfubccre.pbz
uggc://jjj.frkgi.pbz
uggc://jjj.frklpbyyrtrtveyf.pbz
uggc://jjj.fvzcylnfvna.pbz
uggc://jjj.fbncl.pbz
uggc://jjj.grratbqqrff.pbz
uggc://jjj.gurbetl.pbz
uggc://jjj.hasnvgushy.pbz
uggc://jjj.ipq1.pbz
uggc://jjj.jrgayhfgl.pbz
uggc://jjj.jvyqpureelf.pbz
uggc://jjj.jbeyqfrk.pbz
uggc://jjj.kcvk.arg

Saturday, October 11, 2003

Perl script to test list of URLs for URLs blocked by proxy server.

------------------------------------------------------------------
#!/usr/bin/perl -w
# Xtract & CHallenge
# Extract URLs from a given file and challenge the proxy server with the URLs
# Optionally 2 or 3 arguments:
# xch input-file [scratch-file] output-file
# If only 2 arguments are given, the scratch-file will be named output-file.scr
#
# URL-matching regexp modified from qxurl by tchrist@perl.com in perlfaq9
#

# Check for options

# Default ISP is SCV
$isp = "scv";

if (($ARGV[0] eq "-scv") or ($ARGV[0] eq "-mst") or ($ARGV[0] eq "-msh")) {
# Set ISP
$isp = substr($ARGV[0], 1);
# Go to next argument
shift @ARGV;
}
elsif (($ARGV[0] =~ /-/) and ($ARGV[0] ne "-c") and ($ARGV[0] ne "-x")) {
die "Usage: xch [-scv|msh|mst] [-c|v] source-file [intermediate-file] out-file";
}
else {
# Continue
}

if ($isp eq "scv") {
$isp_long = "SCV MaxOnline"}
elsif ($isp eq "msh") {
$isp_long = "MyStarHub"}
else {
$isp_long = "MySingTel"}

print STDOUT "ISP is $isp_long\n";

if ($ARGV[0] =~ /-/) {
if ($ARGV[0] eq "-c") {
# Challenge only
$x = 0;
$c = 1;
}
if ($ARGV[0] eq "-x") {
# Xtract only
$x = 1;
$c = 0;
}
shift @ARGV;
}
else {
# Default - Do both
$x = 1;
$c =1;
}


if (!($x && $c)) {
die "Can only specify 3 filenames when neither -c nor -x is set" if ($#ARGV == 2);
if ($#ARGV == 1) {
if ($x) {
($infile, $scratch) = @ARGV;
}
if ($c) {
($scratch, $outfile) = @ARGV;
}
}
else {
die "Usage: xch -x source_file challenge_list\nOr: xch -c challenge_list blocked_list";
}
}
else {
if ($#ARGV == 1) {
($infile, $outfile) = @ARGV;
$scratch = $outfile . '.scr';
# print "$infile $scratch $outfile" ;
}
elsif ($#ARGV == 2) {
($infile, $scratch, $outfile) = @ARGV;
}
else{
die "Usage: $0 raw-list [intermediate] blocked_list \n";
}
}

# Main

if ($x) {
xtract_URL();
}

if ($c) {
challenge_server();
}

#############################################
sub xtract_URL {

open INPUT, "< $infile";
open XTRACT, "> $scratch";

$links = 1;

print STDOUT "\nNote that xch does not rework links to www.....\nxcp is recommended.\n";

while (<INPUT>) {
$url = 0;
if (m{ clickurl=http:// (.*?) ["/] }gsix) {
$url = "http://$1";
}
elsif (m{ url=http:// (.*?) ["/] }gsix) {
$url = "http://$1";
}
elsif (m{ < \s* A \s+ HREF \s* = \s* (["']) http:// (.*?) \s* ["/] }gsix)
{$url = "http://$2";}
if ($url) {
print STDOUT $links++, "\t$url\n";
if ($url !~ /\?/) {
print XTRACT "$url\n"
}
}
}
close XTRACT;
#Close subroutine xtract_URL
}

################################################

sub challenge_server {

open CHALLENGE, "< $scratch";
open RESPONSE, "> $outfile" ;

#Set proxy response
#Note that this will change depending on ISP


# Set search string according to ISP
# SCV Maxonline




if ($isp eq "scv") {
$proxy = "";
$banned = "The URL you requested has been blocked by the caching server";
$banned2 = "Please contact your System Administrator with any questions";
}

# MySingtel
if ($isp eq "mst"){
$proxy = "--proxy proxy.zapsurf.com.sg:8080";
$banned = "302 Redirect";
$banned2 = "http://www.singnet.com.sg/cache/unable.html";
}

# MyStarHub
if ($isp eq "msh"){
$proxy = "--proxy.mystarhub.com.sg:8080";
$banned = "302 Redirect";
$banned2 = "http://www1.starhub.net.sg/proxy/access.html";
}

while () {
chop $_;
$line += 1;
# SCV has changed its proxy error response. Now have to check webpage returned
# rather than just the header
if ($isp eq "scv") {
$result = `curl --silent --connect-timeout 1 --max-time 3 $proxy $_\n`;
}
else {
$result = `curl --head --silent --connect-timeout 1 --max-time 3 $proxy $_\n`;
}
if (($result =~ /$banned/) and ($result =~ /$banned2/))
{
# Found a Forbidden site
# Use strict matching to avoid false positives from 403's at remote server
print RESPONSE "$_\n";
print STDOUT "$_ blocked \n";
}
else {
print STDOUT "$line ";
}
}
print "\n";
}

Friday, October 10, 2003

IHTFP Hack Gallery: Hack Ethics: "The 'Hacker Ethic'

Over many years at MIT, a 'code of ethics' has evolved. This informal code is a self-enforced attitude that ensures that hacks will continue to be amusing and well-received both within and without MIT.

According to the 'hacker ethic,' a hack must:

+ be safe
+ not damage anything
+ not damage anyone, either physically, mentally or emotionally
+ be funny, at least to most of the people who experience it

There is no way of enforcing this code, but anything that directly contradicts it will probably not be considered a 'hack' by most of the MIT community. "

MIT IHTFP Hack Gallery Interesting Hacks To Fascinate People: The MIT Gallery of Hacks: Hacks Galore !

IHTFP Hack Gallery: Hacks on Harvard: This is a good place to start your tour of the gallery.

Monday, October 06, 2003


Now this is a hack !


Well, actually it's just a spoof.
Oh, you meant this kind of Hacks

Sunday, October 05, 2003

Blog created. Watch this space.