Showing posts with label programming. Show all posts
Showing posts with label programming. Show all posts

Wednesday, 17 August 2011

A bug in Google search?

I think I have just found a (small) bug in Google search. Unfortunately this explanation may get a little technical for non-programmers, but I will try to explain the problem as simply as I can.

I was looking through some logs for my website and I found that someone had entered a search term akin to 'britishwalks walk 906'. The search led to this search query.

As you can see, Google search displayed the first couple of lines of each entry; in the case of my website the first hit contained the following: '7 Jan 2011 – Walk #906:'

This is strange as the walk was actually walked on the 1st of July. This date is present in my webpage as 01/07/2011. The search engine is obviously taking the date in UK format (day/month/year, or 01/07/2011) and converting it into American format (month/day/year, or 07/01/2011), before displaying the month in three-letter textual form ('Jan' instead of 01).

The previous walk was walked on the 30th of June, and that displays correctly within Google search. This means that Google must perform some data checking; as 30 is greater than the possible number of months (12) it is invalid in the US date format and therefore they display it in the more common UK format. This fault is present on every webpage I have checked where the date component is less than 12.

I have had a quick (but hardly exhaustive) ponder and cannot think of any way my pages could be creating this problem. Likewise, I cannot think of a way of setting your locale in an HTML page to let them know the format of items like dates. I could use a locale-neutral format such as yyyy-mm-dd (e.g. 2011-08-17), but that is far less obvious to my readers, the vast majority of whom are in the UK.

Perhaps if the domain is a 'uk' one or the domain is registered in the UK Google could default to UK date format; this would be much more work for them and would still be prone to potential errors. It may be far simpler for them not to parse the displayed date to include a month name, and instead just to display it as it appears in the webpage. 

This is hardly a major bug or feature, but nonetheless is interesting. Why do they parse a plain-text date within a webpage and convert it to another format? Do they do this for any other date on the page, and if so are these conversions prone to similar errors?

I have done a quick search (with Google, naturally...) and cannot see this reported anywhere else. This means that the bug may only just been created, or a transient feature.

It should also be noted that Google's results are far more helpful than Bing's, which does not even include the obvious webpage in their results.

Saturday, 9 April 2011

Visual Basic

Many people hate Visual Basic. It is produced by Microsoft, a company that attracts a great deal of derision (both rightly and wrongly). Secondly, many people see it as not being 'proper' programming, and a bit of a toy.

The second point is the most valid - although the language has been tided up in the last few years with the conversion to .NET, it is still a little noddy, and does not require a great deal of skill to knock up a simple program. Yet that is also its beauty, as it is an excellent rapid prototyping tool.

I was in a slight funk last weekend - I did not want to do any writing, nor any work on the website or around the house. My mind was totally focussed on the walk that I was planning for later in the week. Unfortunately I am incapable of lazing around and doing nothing. So whilst Sean Connery tried to steal Red October, I loaded Visual Basic 2010 onto my new laptop and had a play. I had not used Visual Basic for at least a year, and it was my first experience of the latest version.

A few months ago I wrote a post about MP3 players. At the time, I did a quick and dirty calculation about how long, in terms of duration, my MP3 collection was. The rough figure I came up with was 44 days.

So I decided to fend off boredom by working out a better figure. Within two hours I had created a Visual Basic program with front-end that scanned through my Podcasts and worked out the total duration of all the files (*). The figure: over 80 days of files, and growing by at least a day a week.

Of course, this could have been done in other languages, such as C, Perl or Python. But Visual Basic gave me a program that could run on any Windows PC without having to install any other languages or support infrastructure. In two hours I managed to write a program and User Interface that solved the problem at hand. I did it from a basis of not being an expert in the language, or in having used the latest version. What is more, it was fun.

There is no right or wrong programming language: they all have uses (yes, including Modula-3). A good programmer knows several languages, and picks the right one for the job in hand. A bad programmer weds himself to one language and uses it even when it is not appropriate.

So thank you, Microsoft, for Visual Basic. It does its job, and does it well.

(*) I use the track duration as reported within the file, which can be wrong. A better way would be to parse through the files and calculate the number of samples. This would be an easy change, but would take an eternity to run. The current system will do for the moment.

Sunday, 16 January 2011

Planning for failure.

Years ago I heard a story - possibly apocryphal - about the emergent electronics industry in the sixties. A large American company wanted to get the contract for building some of the Saturn V / Apollo hardware. They worked on their proposal, costed it and got ready for the meeting with NASA.

They were surprised to find that the NASA team was mostly comprised of engineers. This team sat through the company's slick presentation without comment until the end, when they were asked if they had any questions. One of the NASA engineers asked a simple question: "How does it fail?"

The company's marketing men were shocked and did not have an answer. They had prepared for the meeting with lots of questions relating to cost, timescales and capabilities, but this first question totally stumped them.

So why did NASA want to know how it would fail? And why was it their first question? The answer is simple: they trusted the company to meet the specification requested; after all, that was their job. However, they wanted to ensure that if it failed it would not damage any of the other components made by other companies.

After that, the company always had engineers in their meetings with NASA, and always made sure they knew how failure of their devices would affect the rest of the system.

Many of the common bugs in computer programs are caused by the programmer not planning for failure.

Let us take one simple and common instruction in the C programming language. malloc() allocates an area of memory for use by the programmer. On the vast majority of occasions it will succeed, returning a pointer to the memory. However, sometimes it will fail. It is common to see code where the programmer does not check for this failure case. The reason is checking for all possible failures takes time, and programmers are more interested in the cases where it works.

For instance, the following line of code, whilst nominally correct, will have me tearing my hair out:
int *broken_ptr = malloc(20);

A better example would be the following:
int *good_ptr = malloc(20 * sizeof(*good_ptr));
if (good_ptr == NULL)
{
  // Failed to allocate memory, must recover.
}
else
{
  // We can now do something.
  ...
  // We have finished with the buffer. Free the memory.
  free (good_ptr);
  good_ptr = NULL;
}

Even a non-programmer can see that the second example takes far longer to write and requires much more thought. It is, however, much better code (although still not perfect). In particular the programmer will need to consider exactly how to recover from the failure to allocate the memory. Unfortunately, misuse of malloc() in C is a prominent cause of programming bugs.

Similar problems can be seen in many other forms of engineering. It can be seen when 'cascade failures' occur; the failure of one part of a system causes other parts to fail in a cascade. This particularly occurs in power transmission systems, and engineers strive to design against it.

The key is to give engineers the time to design and implement systems fully. It is relatively trivial to get a system working; the real work lies in making it work properly in all cases, including the unforeseen.