What’s in a URL?

Garrett L. Strzok

Word Project ITS1015

What’s in a URL?

This article discusses Uniform Resource Locators (URLs) and how they are structured. This article also discusses how you can use this information to help when trying to find information or speed up your browsing.

URL is short for Uniform Resource Locator. In many browsers, this is a string of text located in the address bar that typically starts with http://… Each URL has several parts, and collectively they allow for a number of activities that people commonly do.

Lets take a moment and break down a URL with the following example:

http://www.google.com/tools/firefox/toolbar/FT3/intl/en/index.html

One way to break this down is to look at each part of the URL in the following way.

<URI scheme> <Host name> <File path> <File name>

This URL may also be viewed in this way.

<Method of communication> <Server name or IP physically located somewhere else><path to the file on the server> <File you are opening>

Looking at this first part   http: This is actually the scheme name part of a URI (Uniform Resource Identifier) and is terminated with a colon character ( : ). The remaining portion of the URI (//www.google.com/tools/firefox/toolbar/FT3/intl/en/index.html) called the scheme specific part, is a string of characters that is defined and interpreted in accordance with the scheme identified in front of it.

In this case, the http: is a URI scheme that is a protocol, but not all schemes are protocols.

“A protocol is a standardized means of communication among machines across a network. Protocols allow data to be taken apart for faster transmission, transmitted, and then reassembled at the destination in the correct order.” (Ref 1) The “Protocol” tells the browser how to communicate with the remote server when retrieving the target document. (Ref 2). A subtle point to be made is that URI schemes are often inaccurately called protocols since most of them were originally used with particular protocols. But today there are URI schemes that have nothing to do with protocols. There are several common schemes in use such as file: http: https: ftp: mailto: news: telnet: data: and many more.

Looking again at the URL, the second portion of the URL, (//www.google.com), we first understand this to be the name host of the server, or the domain name of the server. This domain name is equivalent to an (Internet Protocol) IP address and is used by Domain Name Servers (DNS) servers to help the network locate the physical location of the server and make the connection between the machine you are using, and the host server you are trying to connect to. Second, we also know that (//www.google.com) is interpreted in a way that is consistent with the defined structure that was established in the scheme portion of the URL, in this case the http:.

Another host or domain name that you may recognize is:

//www.yahoo.com – which is the server that is physically located in California and shown below.

Moving on to the (/tools/firefox/toolbar/FT3/intl/en) portion of the URL, we see that this is the path to the file on the server or host that we are connecting too. This is a path from your entry point on the server, this does not mean that you are located in the root directory of the server, you are often brought into the web portion of the machine you are connecting too, where the web content is kept and maintained. So your path is a relative path from where you are brought in. Now looking a little closer at this portion of the URL, we see that from the directory we entered, we were automatically directed to the tools directory, then to the firefox subdirectory, then to the toolbar subdirectory, followed by the FT3 subdirectory, into the intl subdirectory, and then the en subdirectory (the en here most likely means English language). So now we are in that directory.

The last part of the URL, (or this URI scheme specific piece that is defined by the http scheme), is the file name (index.html). This is the file that we are going to have broken apart, sent to our browser across the internet, reassembled and displayed in our browser. We also know that this file name is consistent with the scheme defined at the front of the URI, and that the scheme being used will be looking for a file that ends in .html or .htm to open when we get to the destination. There may also be other HTML files located in this directory, but our browser asked to open (index.html).

Now that we know a little about the structure of URLs we can discuss how this information may be used to improve your personal browsing experience. We can watch the URL as we browse, and start to see patterns in the sites that we visit. We can use this information to start to modify the URL directly to find what we are looking for. Some people call this URL hacking, and the term hacking usually carries a negative connotation, but there is nothing really all that negative about it. But where the line is between using the URL to speed up your browsing experience and being malicious is an area for debate. I have been looking at some of the things you can do with URL hacking, and some of it does start to venture into what I would personally consider to be questionable behavior. Things like inserting code into areas that you may otherwise be entering form data. Depending on the type of application you are interacting with, this can start to cause interesting things in the server. I may use this to surf faster, and I may use it to find information when I get a page not found situation, and still want to try to locate some information at that site. I also like to use certain URL tricks (hacks when searching in Google). One such example would be appending name=”as_qdr” value=”m7” which narrows your Google search to pages indexed within the stated number of months. (Ref 3).

In conclusion, I hope you understand more about URLs and can start to watch them when browsing, and learn about how you can use them to speed up your work and find things you are looking for faster. I know I have learned a lot more about how URLs, URIs, and how browsing works in general.


References

Ref 1 – CITES, What is a URL?, Dec 29 2003, Oct 10 2008, http://www.cites.illinois.edu/101/url101.html

Ref 2 – Williams College Department of Computer Science, URL protocal specifications, May 11 2000, Oct 10 2008, http://www.cs.williams.edu/~cs105s00/outlines/CS105_70.html

Ref 3 – Tara Calishain and Rael, Dornfest, 2003, Google Hacks, pg 32, O’Reilly Media Inc.

Advertisements

One Response to “What’s in a URL?”

  1. […] What’s in a URL? October 2008 […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: