Sunday, 20 November 2016

What is http

What is HTTP?



The "Fast Lane" Answer

HTTP stands for "HyperText Transfer Protocol," and it's the computer communication protocol used for most communication on the world wide web. The protocol is the set of rules that actually conducts the client/server interaction between your web browser and the destination web page. Like a butler, it takes your requests and then retrieves a response from the server in question.

When a client makes a request of a website server, HTTP takes that request and establishes a connection between client and server via TCP. Then HTTP sends the request over to the server, which pulls up the requested information and hands it back, and HTTP carries the response back to the client.

Let's look at an example. Say you want to look at some adorable cat pictures. You direct your browser to your favorite cat-picture website. (You and your browser are the "client" and the cat site is the "server.") HTTP carries your request for cute kitties to the server, the site pulls up a picture, and HTTP brings that picture back to you.

The back and forth continues—like a game of hot potato—for as long as the HTTP session is sustained (by continuing to submit requests to the server), feeding your web browser with all the information it needs to answer your request.

The "Scenic Route" Answer

If you're like most people, there are more than a few things you don't know how to read in a URL (Uniform Resource Locator). One of those things probably looks like this:

http://

Well, brace yourself, because we're about to help you feel really smart. HTTP is both simple and critical, and you're bound to have a lot of fun knowing something all the other kids at school don't know.

Putting the "Hyper" in Hypertext

HTTP stands for HyperText Transfer Protocol. It is the sequence of rules or protocols that mediate the communication and transfer of data between these links. On a larger scale, it's the part of the internet that actually runs your requests from your computer to the website you're browsing. Like many things going on when you use the internet, you don't actually notice HTTP—that is, unless something goes wrong. It serves you dutifully every day, and today you're finding out what those four letters at the front of the URL mean.

All this might prompt the question, "What is hypertext? Is that like a superhero name or something?" While that latter question gives us some crazy-cool ideas, we sadly have to answer "no" (or at least, "not yet"). Hypertext is a concept that refers to interconnectivity on the internet. Whenever there's a word or section of text online that links to something else, that's hypertext.

The format of hypertext that you're likely most familiar with is the blue-highlighted words that you can click on to go to other web pages. They're the reason you spend 3 hours reading instead of 3 minutes when you pull up Wikipedia.

Layers of the Internet

So here's how it works: the internet is built in layers, like a cake. Some layers are occupied by programs, some are occupied by sequences of rules and process methods, and some are simply filled with a bit of information or two. The highest layer is the layer you interact with, the application layer. That's where your web browser sits. You know, the thing you're using right now to read this article.

Below that are layers like IP (which conducts the transmission of data), and TCP (which handles the packing and unpacking of data at either end of the trip). Between these lower layers and the application layer is the layer that interfaces between them—and that's where HTTP sits. HTTP is a facilitator that sets up the TCP connection and fires off your requests to the destination website.

Diagram of an HTTP Connection

Let's start with how you go places. You see that bar up at the top of the browser? That web address that starts with "http" is the URL, and you might consider it the GPS coordinates of where you currently are on the internet. Usually it's tied to the domain name of the website in question, but it will have fun crazy bits you don't know how to read, including lots of backslashes, percent signs, ampersands, and other characters that prove that the developers just want to mess with us.

URLs can be entered in full, or as just the base domain name (i.e., entering SmartyStreets.com will still lead you to us). Once you have the URL punched in, HTTP goes right to work, establishing a connection between your computer and the computer running that particular website.

We call these situations client/server setups. The client is you, the user, with your device and system on that end. The server is the physical hardware on the other end that runs the website you're trying to access. HTTP is a go-between, mediating between the client and the server, via TCP and IP. The entire exchange from the first request to the last, is called an HTTP session.

To facilitate these sessions, there's a fun thing on the server side called a daemon, waiting for HTTP requests to come in. The whole purpose of this Hypertext Transfer Protocol daemon (or HTTPD) is to wait for these requests. When it gets one, it helps the server begin processing and responding to the request.

Side note: Daemons are independent programs, as opposed to their cousins, "demons" which function as a part of a larger program. And yes, they are named for their mythological counterparts, though (as far as we know) they are far less malevolent.

Status Codes

When the server sends its response back to the client, the first thing it sends is something called a status code. Status codes are numeric codes accompanied by explanation phrases that indicate if the response was a success or if there was some sort of complication. The one you want, obviously, is the one that says "Yeah, we got that; here's what you asked for." This is a status code of 200—OK. But since you don't get that every time, here's an explanation of some of the error codes you might receive.

Error Codes

Error codes are status codes that indicate that there's been some sort of problem in getting you what you asked for. Annoying as they are, error codes serve an important a purpose: they help identify the problem, thereby helping you fix it. Observe:

400—Bad Request:this one means that the actual bits and bytes sent as a request may have gotten irreversibly jumbled, or pieces of it may have gotten lost. This corruption of data makes it pretty much unintelligible, even for the computer. This code's helpful because it tells you that simply refreshing the page will likely fix your problem.

401—Unauthorized:this one means you're not allowed to access this website or part thereof. It pops up most frequently when you try to go somewhere that requires you login first. So this one's helpful for reminding you to sign on.

404—Not Found: probably the one you're most familiar with, this one pops up anytime the server can't find what you're looking for. Sometimes web pages or their contents get pulled down or deleted. When that happens, searching for them leads to a 404 code (like this one), telling you that what you're looking for isn't in the place you're looking.

418—I'm a teapot: this code was an April Fool's joke, intended to be returned by teapots instructed to make coffee. Now it's mostly just an easter egg occasionally used by developers who want to have a giggle. If you're really paying attention, about half of the internet is just a thin veneer hiding treasure troves of jokes like this.

Caching

HTTP does a number of things to try to speed itself up, but the most notable is caching. Caching involves storing frequently used information from a website, so that it doesn't have to be requested and retransmitted upon successive visits to the site. It's kind of like a portion of the website is already preloaded onto your web browser, and it does a lot to speed up browsing on the internet.

And speaking of speeding things up…

HTTP Versions

Early Versions
The very first version of HTTP was—get this—HTTP 0.9; not 1.0, or even 0.0 as a programmer might start counting. We don't know why they decided to begin it where they did, but it was 1991 and we were still shaking off the weirdness of the '80s. The more reasonably named HTTP 1.0 didn't show up until 1996.

HTTP 1.1

A vast improvement on both 0.9 and 1.0, HTTP 1.1 made its debut in 1999. It made some important changes that drastically sped up the internet, one of the most important being the setup of a TCP pipeline. Basically, in 1.0, requests were made one at a time, with a separate connection established for each one. It's kind of like washing your dishes by putting them in the dishwasher one at a time, and running it to wash a single dish.

It works, but it's not a very effective way to get the job done.

1.1 improved this by establishing more perpetual connections, connections that allowed for batching of requests. This not only speeds up how fast requests can be sent, received, and responded to, but it also cuts down on internet traffic overall, which decreases latency simply due to the fact that people aren't hogging the internet like your roommate that takes 45-minutes showers and uses up all the hot water.

HTTP 2.0

2.0 is new; it didn't come out to greet the world until 2015. With hardware leaping and bounding in its upward momentum, software and programming need to keep up. Sometimes they don't, though, which is how we end up with 2015 internet capacity running 1999 HTTP. Version 2.0 aims to address some of the limitations that 1.1 imposes upon the internet, and add things that help speed it up and smooth it out.

2.0 added things like multiplexing and concurrency, making it possible to send multiple requests at the same time (or nearly the same time) on a single TCP connection. The server-side TCP doesn't even have to receive them in the proper order; it can sort through that as it all comes in. This cut down on how many client/server connections have to be established.

2.0 also adds stream dependencies—a fancy way of saying that you should be able to tell the server which resource you're requesting is most important for you to get your hands on.

At the time of this article's creation, the internet is still in the process of transitioning to 2.0. Oddly enough, this coincides with the transition from IPv4 to IPv6. So the internet's not just getting a facelift, it's getting reconstructive surgery.

HTTPS

Now, what do you do when you have sensitive and private information you need to transmit, like social security numbers or credit card info? Well, regular ol' HTTP can't help you much, since it keeps dedicated hackers out as well as a screen door keeps a hungry dog out of your kitchen. So, instead of using HTTP, you use HTTPS.

HTTPS stands for the same thing as HTTP, except the "S" stands for secure. The long answer involves discussion of terms like "SSL," "TLS," which we're not going to provide here. The short version works like this: HTTPS adds keys to the system, and you have to have the appropriate key to unlock the securities to get to the data.

HTTPS communications certify themselves so that you know that they are in play. You can tell they're working properly when the URL turns green, and there's a little lock symbol at the beginning of it. Green, of course, means you're good to go, and that your sensitive information is safely locked away behind that little padlock.

Conclusion

When it all comes down to it, HTTP is your go-to guy for getting stuff done on the internet. It's an evolving system, and one that's finally catching up to the world that's using it. It's the standard; it's become so prolific that you don't even need to put "http://" at the beginning of a URL anymore—it's just assumed.

So go ahead; impress your friends with your newfound knowledge of—and appreciation for—a part of the internet that goes unnoticed by so many.

Spacial Thanks to  smartystreets

Blog by S.Adhikari

No comments: