Entries Tagged as 'Programming'

Proof of my programming ability

Yesterday we had a representative of one of our clients in for a meeting. I can’t say who it was, or what company they where from but I spent most of the day working on some simple PHP programs for them. I had worked on them before, but they wanted some changes to be done. So at about 4pm this client comes to my desk and says that he has this problem that he thinks I may be able to help with.

He wanted a program that would take in multiple IP addresses and find out not only where in the world they were from, but also who the IP address belonged to, basically the ISP of these IP addresses. The interesting thing was that I was in a race with one of their programmers who had been working on the same thing since Monday and was several hours away from completing it. So I started work.

At about 12:00 today I had a fully working IP address to ISP lookup tool which I packaged up and sent off to them. The client said it was exactly what he was looking for and that the data that their programmer had come up with was only a fraction of the information that my tool displayed. It also allowed the user to export the data into a CSV file, which he thought was brilliant.

Paul (my company director) was very impressed (he even shook my hand) and it looks as though I have made the company a lot of money. But not only that, on a personal level I beat a lead developer in another company by at least 4 days to produce a fully working tool that was said to be impossible.

Sometimes I surprise even myself!

Opening A Web Page As a File

Those of you who are PHP programmers will probably be familiar with opening a file. You use the PHP function fopen() to create a handle for the file, and then use this handle to do things like read or write to the file.

The usual syntax for fopen() is as follows, the first parameter is the file name and the second is the type of opening that PHP will do. Be careful what you use here as it effects the contents of the file and what you are trying to do with it.

$handle = fopen("afile.txt", "r");

Did you also know that you can open a URL as a file. This means that you can grab the contents of a website quite easily. For example:

$handle = fopen("http://www.aurl.co.uk/", "r");

The difficulty next is how to access this file handle, you can’t just use the fread() function as you can’t tell how big the file is, and therefore how much to download. There is a solution, but it depends on the version of PHP you are using. With PHP5 there is a function called stream_get_contents() which will push the contents of the page into a variable. You can say how much of the page you want, but you can also just leave it blank to get all of it. For PHP4 users you will need to use fread() in little chunks of 8192 bytes each until the page has been downloaded. The following if statement will give you the contents of a page that has previously been opened using the fopen() function.

$contents = "";
//check php version
if(phpversion()>5){
	$contents = stream_get_contents($urlh);
}else{
	while(!feof($urlh)){
		$contents .= fread($urlh, 8192);
	};
};

You can now use the $contents variable to do whatever you want.

There is one possibility in that the site you want to get hold of may be behind server side authorisation. To get hold of this you will need to force a HTTP/1.0 request for it as fopen() doesn’t support HTTP/1.1 requests. Most servers should be set up to be able to support this so you should get away with it. To do this use the following code:

fopen("http://username:password@www.example.com","r");

This will allow you to get hold of the page with the proper authentication. Obviously you will need to exchange username for you username and password for your password. I found this really useful, so please borrow it for whatever you want.

Validating XHTML and the target attribute

You may not have realised but XHTML doesn’t support the target attribute of the <a> tag.

For example, this code:

<a href="blabla.html" target="_blank">Bla bla</a>

Would not validate as the "target"attributewas left out of the XHTML specification. This code is meant to open up a new window when the user clicks on the link.

The only way so solve this is to use JavaScipt in one of two ways.

The first is to add lots of things to the a tag to get the tag to open in another window. This is a bit messy, but generally works. The trick is to remember to put all of this code every time you want a link to open in a new window.

<a href="http://www.norton42.org.uk" onclick="window.open(this.href); return false;" onkeypress="window.open(this.href); return false;">Norton42.org.uk</a>

As the first method is a bit messy the other solution is to drop the target attribute and use the rel attribute with the value of external. Like this:

<a rel="external" href="http://blahblah.com">new link</a>

Using this attribute you can then use a Javascript method to replace all of the rel attributes with target attributes.

function externalLinks() {
 if (!document.getElementsByTagName) return;
 var anchors = document.getElementsByTagName("a");
 for (var i=0; i<anchors.length; i++) {
   var anchor = anchors[i];
   if (anchor.getAttribute("href") &&
       anchor.getAttribute("rel") == "external")
     anchor.target = "_blank";
 }
}
window.onload = externalLinks;

With the rel attribute the XHTML validates, and the JavaScript allows the <a> tags to open in another window. Although what we are doing here is getting JavaScript to replace the code on the site, and this seems a little cheeky to me. However, if a user doesn´t have JavaScript then they won´t see any funny effects.

Note on PHP contact form

I recently came across a mechanism by which someone could take over a "Contact Form" written in PHP and use it to send spam. They do this by over filling certain fields in a hope that they can add more parameters to the mail() function.

The mail() function in PHP has the following parameters:

mail( string to, string subject, string message [, string additional_headers [, string additional_parameters]] )

The additional headers parameter is used to convey anything else you can find in an email, these are things like CC and BCC. It is the BCC header that the spammer hopes to take over, adding his own set of addresses so that it looks as though you have been sending spam. He does this in one of two ways:

  1. By adding PHP code to the message box so that when the mail function is called there will be extra headers in the additional headers parameter.
  2. By adding code to the address bar of the browser in such a way that the form is changed. He might change one of the text boxes to a text area and use this new area to add the headers.

By doing either of these things he hopes to add another carage return character and adding a BCC field after that.

There are a number of ways to combat this.

The first is to use the strip_tags() method. This will remove any PHP or HTML tags from any string. The strip_tags() method has two parameters. The first is the string that you are supplying to the method, the second is any HTML tags that you do actually want to be kept. You can use the following value to allow a number of HTML tags at once.

$allowedTags = '<p><br><b><i><strong><u><h1><h2><h3><h4><h5><h6>'

This will allow most of the formatting tags to be passed through. The main use of this function is to strip out any PHP tags.

You can also use the str_replace() method to do more or less the same thing. This method takes three arguments. The first is what to look for in a string, also called the needle. The second is what to replace it with, also called the haystack, and the third is the string itself. If you want to pass more than one needle in a single function call by using an array of values, as in the following:

$deleteAllTags=array('<','>','','/','=','+');

Using these two methods in conjunction should stop most stuff getting through. You should be checking all of the fields to validity anyway, and by doing this you stop most of these attacks getting in.

The second method is to always use POST methods to send the data in the form to the server. This way the form will ignore anything written in the address bar. As it is bad practice to use GET methods in any form you can usually get around this method without even trying.

The contact us form on my website should stop this sort of thing. I have tested it as well as I can. If you want the code then contact me! If you have any other comments or questions then let me know.

Terrible Travesty of Validation

I was looking through some sites the other day, just wondering about which ones were validated or not. As it turns out most of them were not validated. However, there was one that stood far above the crowd as being a complete pile of poo in terms of validation.

There are a number of waysto check the validation of a site. You can go to the W3C validation site and type in the address of the site you want to check. Or you can download and install the very excelent Firefox and the even more brilliant Web Developer Extension; the work of a guy called Chris Pederick. This toolbar allows you to click on a button and validate any site that you are looking at as well as a whole host of other really usefull things.

The site I was talking about above was Amazon.co.uk. As of the date if this article the front page on Amazon has 1160 errors. This is very, very poor, for a number of reasons.

I am of the opinion that compliance with web standards is essential in any web site or web based product. It ensures that the site is accessible to people and machines and that the site will function with the advent of new browsers. The second point is work based as it stops developers having to redesign their sites every time a new browser comes out. The first point, that of accessibility, should be at the forefront of every website developers mind as sites that are not accessible do not conform to the UK Disability Discrimination Act. Part three of this act refers to the provision of goods, facilities and services, which specifically mentions websites. So if your site does not conform to at least level 2 of the Web Content Accessibility Guidelines (which includes validation) you will in breach of the act and in danger of being fined or even sued.

The U.S.A. has an equivalent law called Section 508.

So my message to Amazon is this. I will continue to buy products from your site as I can’t fault your shop for it’s value and service. But please work towards validation! Having over a thousand errors on your site makes it look like you really don´t care.

Parsing XML with PHP

Part of the data structure of this blog involves the sorting of data as a bunch of XML files. As a result I learnt a bit no how to parse XML with PHP. The following is an explanation of how to do this in a flat method structure. Later I will explain how to do more interesting things with thisand then how to do this with a class. There are various different methods involved in parsing XML in PHP, I will go through each of them and then put it all together at the end.

xml_parser_create()

This function is used to create the parser object that will be used by the rest of the process. This is stored into a variable for use in the other function associated with parsing XML. An example of the call is below.

$xml_parser = xml_parser_create();
xml_set_element_handler()

Next we need to set up the functions that will be used in the parsing of the script. The xml_set_handler method takes the following parameters:

  • XML parser reference. This is a reference to the parser that was created using the xml_parser_create function.
  • start element. This is a reference to a function that will be called when a start element is found as the parser runs.
  • End element This is a reference to a function that will be called when an end element is found as the parser runs.

The last two parameters need to be functions with specific footprints, that is, they need to have the correct parameters, but you can call then whatever you want. Here is an example of the call to the function xml_set_element_handler().

xml_set_element_handler($xml_parser, "startElement", "endElement");
startElement function

Above the call to the function xml_set_element_handler you will need to have set out a method that will read start element data. The method need to have the following parameters:

  • Parser This is a reference to the xml parser that was created in the call to xml_parser_create.
  • Name The name of the start element
  • Attribs This is an associative array of attributes that the start element may contain.

So your function might look something like this:

function startElement($xmlParser,$name,$attribs){
    echo "Start: ".$name."<br />";
};

All this will do is print off the name of the element, but you can do a lot more. For example, let say that one of your elements is called <title>, you can use an if or switch statement to store this value in a variable for use later. Like this:

function startElement($xmlParser,$name,$attribs){
    global $variable;
    $variable = $name;
};

Remember that you will need to put this function declaration BEFORE the call for xml_set_element_handler, PHP needs to know about this method so that it can point the parser towards it.

endElement function

This function is called when the parser encounters an end element. In an opposite operation as before you might need to clear the variable you stored during the start element function. Again this decleration MUST be before the call for xml_set_element_handler. Note that if the tag is self closing then there will be no end element. The function must have the following parameters.

  • xml_parser The parser created in the call to xml_parser_create.
  • name The name of the element.

The following code will just print of the name of the end element, you can use this function to overright anything that may have happened in the startElement function. For example, you may have set a value in the startElement to keep track of the depth of the parser into the XML document, you can use this method to reduce it. This might be important if there is more than one element with the same name, but in a different context.

function endElement($parser,$name){
	echo "End: ".$name."<br />";
};
xml_set_character_data_handler()

The next function to call is xml_set_character_data_handler. This takes two parameters:

  • xml_parser This is a reference to the xml parser that was created in the call to xml_parser_create.
  • characterData This is a reference to the method that will be called when character data is found.

This function works in the same way as the xml_set_element_handler function in that it simply sets a reference to the function that will be called when character data is encountered. The function is called like this.

xml_set_character_data_handler($xml_parser, "characterData");
characterData function

The characterData function, which again MUST be placed before the call to xml_set_character_data_handler must also have the following parameters.

  • xml_parser The reference to the xml parser created in the call to xml_parser_create.
  • data The data held within the XML element. Any CDATA tags have been used then the parser will return everything between those tags so no need to worry about cutting them out.

So when the parser object finds a data object this method is called. The following function will just print out the data.

function characterData($parser,$data){
	echo "Data: ".$data."<br />";
};

One thing that it is essential that you look out for is the funny thing that the parser does when it encounders certain conditions. It will stop parsing and call the function again. This repeats until all of the data has been passed. I've listed (I think) all of the conditions below.

  • The parser runs into an Entity Declaration, such as &amp; (&) or &#039; (')
  • The parser finishes parsing an entity.
  • The parser runs into the new-line character (n)
  • The parser runs into a series of tab characters (t)
  • The content of the $data parameter is more than 1024 (bytes).

The best way to explain this is to use an example. Lets say that you have the following string as part of the data.

some text&
some more text'
last bit of text

If you used the previous example method of just printing out the information then the parser will print out the follwing:

Data: some text
Data: &
Data: some more text
Data: '
Data: last bit of text

So be sure that when you call the method to make sure that all of the character data is passed through. One thing you could do is to have the characterData function add the data to a string. The string is initiallised when the startElement function is called and printed off when the endElement function is called.

xml_parser_set_option

This method is optional and can be used if you want the parser to have a certain behaviour. For example, to turn off case folding on the parser use the following code.

xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, false);

Case folding is basically the turning of characters to their uppercase equivalent. However, in XML all tags must be lowercase so and for some reason the default of the parser is for this to be on. So if you create w3c valid XML make sure that you use this function to turn off case folding. Here is a list of the available options for this function.

  • XML_OPTION_CASE_FOLDING (integer) Controls whether case-folding is enabled for this XML parser. Enabled by default.
  • XML_OPTION_SKIP_TAGSTART (integer) Specify how many characters should be skipped in the beginning of a tag name.
  • XML_OPTION_SKIP_WHITE (integer) Whether to skip values consisting of whitespace characters.
  • XML_OPTION_TARGET_ENCODING (string) Sets which target encoding to use in this XML parser. By default, it is set to the same as the source encoding used by xml_parser_create(). Supported target encodings are ISO-8859-1, US-ASCII and UTF-8.
xml_parse()

This function is used to run the parser over some input. It takes the following parameters:

  • xml_parser This is a reference to the xml parser created in the xml_parser_create() function.
  • data A chunk of data to parse.
  • end (optional) If this is set to true then this is the last bit of data from the source.

As you can see the xml_parse() function can be run over and over again until all of the data has been read from the file.

if(!($fp = fopen($this->file, "r"))){
    die("could not open XML input");
};
while($data = fread($fp,4096)){
    if(!xml_parse($xml_parser,$data,feof($fp))){
        die(sprintf("XML error: %s at line %d",xml_error_string(xml_get_error_code($xml_parser)),xml_get_current_line_number($xml_parser)));
    };
};
xml_parser_free()

As the name suggests this function is called at the end of the XML parsing run. It basically just clears up the memory and throws away the XML parser created at the start.

Well there you are then I think you should be able to figure out the main points from there, but just as an example I have put the code together into something that will spit out XML into formatted HTML, albeit a little ugly. Have fun!

// the start element function
function startElement($xmlParser,$name,$attribs){
	echo "Start: ".$name."<br />";
};
// the end element function
function endElement($parser,$name){
	echo "End: ".$name."<br />";
};
function characterData($parser,$data){
	echo "Data: ".$data."<br />";
};
$xml_parser = xml_parser_create();
xml_parser_set_option($xml_parser,XML_OPTION_CASE_FOLDING, false);
xml_set_element_handler($xml_parser,"startElement","endElement");
xml_set_character_data_handler($xml_parser,"characterData");
if(!($fp = fopen("an_xml_file.xml","r"))){
    die("could not open XML input");
};
while($data = fread($fp,4096)){
    if(!xml_parse($xml_parser,$data,feof($fp))){
        die(sprintf("XML error: %s at line %d",xml_error_string(xml_get_error_code($xml_parser)),xml_get_current_line_number($xml_parser)));
    };
};

Feel free to use the material contained in this tutorial for whatever you want.

Student Database

I have just made the finishing touches to a database that I am creating for Mandy to use at her work. She is a secretary in the Language and Learning Center in the University of Aberystwyth. They have this old Access database written by a guy who didn’t know much about Access, so it doesn’t do what they need. I took the job on about 6 months ago, but I keep forgetting about it (I am doing it for free), so I have made the effort and should have it complete before the next accademic year, which starts September 20th(ish). It is basically a student/course tracker with some extra bits thrown in, eg. it can create invoices and payment records. If anyone wants a look then let me know.