Understanding the HTTP Protocol (Part 1)

by Don Parker [Published on 22 Sept. 2005 / Last Updated on 22 Sept. 2005]

The Internet is composed of all kinds of packets containing various types of traffic. One of the most used application layer protocols is HTTP. We will take a look at what really happens during an HTTP transaction, and thereby gain a deeper understanding of what this protocol is about.

If you would like to read the next article in this series please check out Understanding the HTTP Protocol (Part 2).

HTTP the protocol

The world of computer network has been around for quite a few decades now, but it was not until the early 1990’s that the internet, as we know it today, began to take off. It was slowly around this time that the Internet began to become better known, and it's popularity with the public increase. Now there are many reasons that I have heard, as to why the Internet grew in leaps and bounds. I will go with the urban legend that it was not until the appearance of adult content websites that the Internet really grew popular.

What did these websites all have in common though? Well they all used the HTTP protocol aka Hyper Text Transfer Protocol, to conduct their business. While HTTP is carried around by both IP and TCP, it is HTTP itself that allows you to interact with a web server. Your favorite web browser speaks HTTP to the web server, and you in turn get the web page that you requested.

There is much more to it then this as a great deal of information needs to be exchanged between the web client ie: Internet Explorer, and the web server ie: IIS. It is this exchange of information that will help set things up between the browser and server. You could think of it as the TCP/IP handshake really, as there are various details sent back and forth to facilitate the web transactions.

So, on that note lets get ready to take a look at what the web browser sends to the web server after the TCP/IP handshake is over. Well first off, once our three way TCP/IP handshake is done, the browser, or Internet Explorer in this case, sends it's information to the server. Right below this sentence is a packet showing such an example. I will comment on the information contained within it directly below it.

10:14:50.479387 IP (tos 0x0, ttl 128, id 11651, offset 0, flags [DF], proto: TCP (6), length: 529) 192.168.1.100.1722 > 72.14.207.99.80: P, cksum 0x899f (correct), 3141402438:3141402927(489) ack 3866955399 win 65535
0x0000:  4500 0211 2d83 4000 8006 f1e5 c0a8 0164  E....@........d
0x0010:  480e cf63 06ba 0050 bb3d ff46 e67d 0e87 H..c...P.=.F.}..
0x0020:  5018 ffff 899f 0000 4745 5420 2f20 4854  P.......GET./.HT
0x0030:  5450 2f31 2e31 0d0a 486f 7374 3a20 7777  TP/1.1..Host:.ww
0x0040:  772e 676f 6f67 6c65 2e63 610d 0a55 7365  w.google.ca..Use
0x0050:  722d 4167 656e 743a 204d 6f7a 696c 6c61  r-Agent:.Mozilla
0x0060:  2f35 2e30 2028 5769 6e64 6f77 733b 2055  /5.0.(Windows;.U
0x0070:  3b20 5769 6e64 6f77 7320 4e54 2035 2e31  ;.Windows.NT.5.1
0x0080:  3b20 656e 2d55 533b 2072 763a 312e 372e  ;.en-US;.rv:1.7.
0x0090:  3130 2920 4765 636b 6f2f 3230 3035 3037  10).Gecko/200507
0x00a0:  3136 2046 6972 6566 6f78 2f31 2e30 2e36  16.Firefox/1.0.6
0x00b0:  0d0a 4163 6365 7074 3a20 7465 7874 2f78  ..Accept:.text/x
0x00c0:  6d6c 2c61 7070 6c69 6361 7469 6f6e 2f78  ml,application/x
0x00d0:  6d6c 2c61 7070 6c69 6361 7469 6f6e 2f78  ml,application/x
0x00e0:  6874 6d6c 2b78 6d6c 2c74 6578 742f 6874  html+xml,text/ht
0x00f0:  6d6c 3b71 3d30 2e39 2c74 6578 742f 706c  ml;q=0.9,text/pl
0x0100:  6169 6e3b 713d 302e 382c 696d 6167 652f  ain;q=0.8,image/
0x0110:  706e 672c 2a2f 2a3b 713d 302e 350d 0a41  png,*/*;q=0.5..A
0x0120:  6363 6570 742d 4c61 6e67 7561 6765 3a20  ccept-Language:.
0x0130:  656e 2d75 732c 656e 3b71 3d30 2e35 0d0a  en-us,en;q=0.5..
0x0140:  2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d3a  ---------------:
0x0150:  202d 2d2d 2d2d 2d2d 2d2d 2d2d 2d0d 0a41  .------------..A
0x0160:  6363 6570 742d 4368 6172 7365 743a 2049  ccept-Charset:.I
0x0170:  534f 2d38 3835 392d 312c 7574 662d 383b  SO-8859-1,utf-8;
0x0180:  713d 302e 372c 2a3b 713d 302e 370d 0a4b  q=0.7,*;q=0.7..K
0x0190:  6565 702d 416c 6976 653a 2033 3030 0d0a  eep-Alive:.300..
0x01a0:  436f 6e6e 6563 7469 6f6e 3a20 6b65 6570  Connection:.keep
0x01b0:  2d61 6c69 7665 0d0a 436f 6f6b 6965 3a20  -alive..Cookie:.
0x01c0:  5052 4546 3d49 443d 3031 6130 3832 3234  PREF=ID=01a08224
0x01d0:  3534 6163 6232 3933 3a4c 443d 656e 3a54  54acb293:LD=en:T
0x01e0:  4d3d 3131 3231 3633 3830 3934 3a4c 4d3d  M=1121638094:LM=
0x01f0:  3131 3231 3633 3830 3934 3a53 3d6a 2d30  1121638094:S=j-0
0x0200:  3970 3851 6870 5953 5f43 7253 500d 0a0d  9p8QhpYS_CrSP...
0x0210:  0a 

If you remember from reading the articles on TCP/IP that I wrote, you will realize that the application layer data, in this case HTTP, will begin after the TCP header. I have underlined the first two lines showing exactly where it starts. Though I only underlined two lines please understand that the entire remainder of the packet consists of HTTP data. We will now explain the various words that we see in the ASCII content of this packet.

GET /HTTP/1.1

This says that the web client is issuing a GET request to the web server ie: it wants something from it, and that the web client understands HTTP 1.1. There is also HTTP 1.0 but that has largely been replaced by the newer version of HTTP 1.1. Currently there are efforts underway to deploy HTTP 2.0 at some future date.

Host: www.google.ca

This is the website that the client wants to connect to or GET as it were.

User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1;

This tells the webserver that the web client is Mozilla version 5.0 running on Windows version NT 5.1 or as it is more commonly known Windows XP, which is indeed what my operating system is.

en-US; rv: 1.7.10)

This above information tells the web server that the web client understands or uses the en:US character set, and if I am not mistaken revision 1.7.10

Gecko/20050716 Firefox/1.0.6

This lists the exact web browser that the client is using, which as listed above is Firefox

Accept: text/xml, application/xml, application/xhtml+xml

The client here tells the web server that he can accept the following formats, for both text and application ie: he will accept text in xml format and so on.

text/html; q=0.9, text/plain; q=0.8, image/png, */*;q=0.5

In the above the client is saying that he will accept text in both html and plain formats, and also will accept images in png and all other formats. The various q=0.8 and so on are weighting values used by Firefox to indicate it’s preference for various mime types. These values are floating point, and are weighted between 0 and 1.

Accept-Charset: ISO-8859-1, utf-8; q=0.7, *;q=0/7

Listed here are the character sets that the web client will understand ie: ISO-8859-1, and utf-8.

Keep-Alive: 300 Connection: keep-alive

This tells the web server that it will keep the session alive for 300 seconds, or until the client explicitly ends the session. In version 1.1 of HTTP the connection will remain open until the client terminates it unlike in version 1.0 of HTTP which would terminate the connection after every request. It makes a lot more sense to simply keep it open for a designated time, or until the web client terminates it.

Cookie: PREF=ID=01a0822454acb293: LD=en:TM=1121638094…..

This last piece of the puzzle is the cookie and it's values. Many people have some odd conceptions when it comes to cookies. All a cookie is, is simple text, flat ASCII if you will. There is nothing executable about it. What cookies do contain though is information about your computer ie: browser type, and the such. Quickly said, there are also two types of cookies; session based and persistent. The first cookie is only good for the duration of your browser being up, and the second will remain on your computers hard drive for as long as it has been programmed to. You can also look at the cookies on your browser if you so choose. If you ever clicked “yes” on one of those “would you like us to remember your username and password” questions, then please realize that this is done via a cookie that the server stores on your computer.

Conclusion

Well this brings us to the end of part one of the HTTP article series. In this article we covered what is seen in the ASCII content of a packet when a web client first connects to a web server. Many fields of interest are listed as you can see, and specifically so to someone who may be a hacker for instance. Remember, the user agent string will reveal what your browser type and operating system are. This certainly helps a hacker who may be trying a client side exploit on you ;-). Till I see you in part two, have fun!

If you would like to read the next article in this series please check out Understanding the HTTP Protocol (Part 2).

Featured Links