HTTP cookies, sometimes known as web cookies or just cookies, are parcels of text sent by a server to a web browser and then sent back unchanged by the browser each time it accesses that server. HTTP cookies are used for authenticating, tracking, and maintaining specific information about users, such as site preferences and the contents of their electronic shopping carts. The term "cookie" is derived from "magic cookie," a well-known concept in Unix computing which inspired both the idea and the name of HTTP cookies.
Cookies have been of concern for Internet privacy, since they can be used for tracking browsing behavior. As a result, they have been subject to legislation in various countries such as the United States and in the European Union. Cookies have also been criticised because the identification of users they provide is not always accurate and because they could potentially be used for network attacks. Some alternatives to cookies exist, but each has its own drawbacks.
Cookies are also subject to a number of misconceptions, mostly based on the erroneous notion that they are computer programs. In fact, cookies are simple pieces of data unable to perform any operation by themselves. In particular, they are neither spyware nor viruses, despite the detection of cookies from certain sites by many anti-spyware products.
Most modern browsers allow users to decide whether to accept cookies, but rejection makes some websites unusable. For example, shopping baskets implemented using cookies do not work if cookies are rejected.
HTTP cookies are used by Web servers to differentiate users and to maintain data related to the user during navigation, possibly across multiple visits. HTTP cookies were introduced to provide a way for realizing a "shopping cart" (or "shopping basket"),[1][2] a virtual device into which the user can "place" items to purchase, so that users can navigate a site where items are shown, adding or removing items from the shopping basket at any time.
Allowing users to log in to a website is another use of cookies. Users typically log in by inserting their credentials into a login page; cookies allow the server to know that the user is already authenticated, and therefore is allowed to access services or perform operations that are restricted to logged-in users.
Several websites also use cookies for personalization based on users' preferences. Sites that require authentication often use this feature, although it is also present on sites not requiring authentication. Personalization includes presentation and functionality. For example, the Wikipedia Web site allows authenticated users to choose the webpage skin they like best; the Google search engine allows users (even non-registered ones) to decide how many search results per page they want to see.
Cookies are also used to track users across a website. Third-party cookies and Web bugs, explained below, also allow for tracking across multiple sites. Tracking within a site is typically done with the aim of producing usage statistics, while tracking across sites is typically used by advertising companies to produce anonymous user profiles, which are then used to target advertising (deciding which advertising image to show) based on the user profile.
Technically, cookies are arbitrary pieces of data chosen by the Web server and sent to the browser. The browser returns them unchanged to the server, introducing a state (memory of previous events) into otherwise stateless HTTP transactions. Without cookies, each retrieval of a Web page or component of a Web page is an isolated event, mostly unrelated to all other views of the pages of the same site. By returning a cookie to a web server, the browser provides the server a means of connecting the current page view with prior page views. Other than being set by a web server, cookies can also be set by a script in a language such as JavaScript, if supported and enabled by the Web browser.
Cookie specifications[3][4] suggest that browsers should support a minimal number of cookies or amount of memory for storing them. In particular, an internet browser is expected to be able to store at least 300 cookies of 4 kilobytes each, and at least 20 cookies per server or domain.
Relevant count of maximum stored cookies per domain for the major browsers are:
Firefox 1.5: 50
Firefox 2.0: 50
Safari 3 public beta
Opera 9: 30
Internet Explorer 6: 20 (raised to 50 in update on 8/14/2007)
Internet Explorer 7: 20 (raised to 50 in update on 8/14/2007)
In practice cookies must be smaller than 4k. MSIE imposes a 4k total for all cookies stored in a given domain.
Cookie names are case insensitive according to section 3.1 of RFC 2965
The cookie setter can specify a deletion date, in which case the cookie will be removed on that date. If the cookie setter does not specify a date, the cookie is removed once the user quits his or her browser. As a result, specifying a date is a way for making a cookie survive across sessions. For this reason, cookies with an expiration date are called persistent. As an example application, a shopping site can use persistent cookies to store the items users have placed in their basket. This way, if users quit their browser without making a purchase and return later, they still find the same items in the basket so they do not have to look for these items again. If these cookies were not given an expiration date, they would expire when the browser is closed, and the information about the basket content would be lost
Since their introduction on the Internet, misconceptions about cookies have circulated on the Internet and in the media.[5][6] In 2005, Jupiter Research published the results of a survey,[7] according to which a consistent percentage of respondents believed some of the following false claims:
Cookies are like worms and viruses in that they can erase data from the user's hard disks
Cookies are a form of spyware in that they can read personal information stored on the user's computer
Cookies generate popups
Cookies are used for spamming
Cookies are only used for advertising
Cookies are in fact only data, not program code: they cannot erase or read information from the user's computer.[8] However, cookies allow for detecting the Web pages viewed by a user on a given site or set of sites. This information can be collected in a profile of the user. Such profiles are often anonymous, that is, they do not contain personal information of the user (name, address, etc.) More precisely, they cannot contain personal information unless the user has made it available to some sites. Even if anonymous, these profiles have been the subject of some privacy concerns.
According to the same survey, a large percentage of Internet users do not know how to delete cookies.
Cookies have some important implications on the privacy and anonymity of Web users. While cookies are only sent to the server setting them or one in the same Internet domain, a Web page may contain images or other components stored on servers in other domains. Cookies that are set during retrieval of these components are called third-party cookies.
In this fictional example, an advertising company has placed banners in two Web sites (which do not show any banner in reality). Hosting the banner images on its servers and using third-party cookies, the advertising company is able to track the browsing of users across these two sites.Advertising companies use third-party cookies to track a user across multiple sites. In particular, an advertising company can track a user across all pages where it has placed advertising images or web bugs. Knowledge of the pages visited by a user allows the advertisement company to target advertisement to the user's presumed preferences.
The possibility of building a profile of users has been considered by some a potential privacy threat, even when the tracking is done on a single domain but especially when tracking is done across multiple domains using third-party cookies. For this reason, some countries have legislation about cookies.
The United States government has set strict rules on setting cookies in 2000 after it was disclosed that the White House drug policy office used cookies to track computer users viewing its online anti-drug advertising to see if they then visited sites about drug making and drug use. In 2002, privacy activist Daniel Brandt found that the CIA had been leaving persistent cookies on computers for ten years. When notified it was violating policy, CIA stated that these cookies were not intentionally set and stopped setting them.[10] On December 25, 2005, Brandt discovered that the National Security Agency had been leaving two persistent cookies on visitors' computers due to a software upgrade. After being informed, the National Security Agency immediately disabled the cookies.[11]
The 2002 European Union telecommunication privacy Directive contains rules about the use of cookies. In particular, Article 5, Paragraph 3 of this directive mandates that storing data (like cookies) in a user's computer can only be done if: 1) the user is provided information about how this data is used; and 2) the user is given the possibility of denying this storing operation. However, this article also states that storing data that is necessary for technical reasons is exempted from this rule. This directive was expected to have been applied since October 2003, but a December 2004 report says (page 38) that this provision was not applied in practice, and that some member countries (Slovakia, Latvia, Greece, Belgium, and Luxembourg) did not even implement the provision in national law. The same report suggests a thorough analysis of the situation in the Member States.
IP address
An unreliable technique for tracking users is based on storing the IP addresses of the computers requesting the pages. This technique has been available since the introduction of the World Wide Web, as downloading pages requires the server holding them to know the IP address of the computer running the browser or the proxy, if any is used. This information is available for the server to be stored regardless of whether cookies are used or not.
However, these addresses are typically less reliable in identifying a user than cookies because computers and proxies may be shared by several users, and the same computer may be assigned different Internet addresses in different work sessions (this is often the case for dial-up connections). The reliability of this technique can be improved by using another feature of the HTTP protocol: when a browser requests a page because the user has followed a link, the request that is sent to the server contains the URL of the page where the link is located. If the server stores these URLs, the path of page viewed by the user can be tracked more precisely. However, these traces are less reliable than the ones provided by cookies, as several users may access the same page from the same computer, NAT router, or proxy and then follow two different links. Moreover, this technique only allows tracking and cannot replace cookies in their other uses.
Tracking by IP address can be impossible with some systems that are used to retain Internet anonymity, such as Tor. With such systems, not only could one browser carry multiple addresses throughout a session, but multiple users could appear to be coming from the same IP address, thus making IP address use for tracking wholly unreliable.
Some major ISPs, including AOL, route all web traffic through a small number of proxies which makes this scheme particularly unworkable.
URL (query string)
A more precise technique is based on embedding information into URLs. The query string part of the URL is the one that is typically used for this purpose, but other parts can be used as well. The PHP session mechanism uses this method if cookies are not enabled.
This method consists of the Web server appending query strings to the links of a Web page it holds when sending it to a browser. When the user follows a link, the browser returns the attached query string to the server.
Query strings used in this way and cookies are very similar, both being arbitrary pieces of information chosen by the server and sent back by the browser. However, there are some differences: since a query string is part of a URL, if that URL is later reused, the same attached piece of information is sent to the server. For example, if the preferences of a user are encoded in the query string of a URL and the user sends this URL to another user by e-mail, those preferences will be used for that other user as well.
Moreover, even if the same user accesses the same page two times, there is no guarantee that the same query string is used in both views. For example, if the same user arrives to the same page but coming from a page internal to the site the first time and from an external search engine the second time, the relative query strings are typically different while the cookies would be the same. For more details, see query string.
Other drawbacks of query strings are related to security: storing data that identifies a session in a query string enables or simplifies session fixation attacks, referer logging attacks and other security exploits. Transferring session identifiers as HTTP cookies is more secure.
Cookies can be used for allowing users to express preferences about a Web site. For example, the Google search engine allows the user to choose how many results are to be shown for every query, and this choice is maintained across sessions.
If a user was authenticated using the technique above, when they request a page the server is also sent the cookie associated with the user. It can therefore adapt the requested page to the stored used preferences. When authentication is not used, the user preferences are stored in a cookie. The users select their preference by entering them in a Web form and submitting it to the server. The server encodes them in a cookie and sends it back to the browser. This way, every time the user accesses a page, the server is also sent the cookie where the preferences are stored, and can personalise the page according to the user preferences.
For example, Google stores the user preferences in a cookie of name PREF. This cookie is created with default values when the user accesses the site for the first time. For example, the cookie value contains the string NR=10, that indicates a default preference of ten hits displayed in each page. If the user changes this number to 20 in the preference page, the server modifies the cookie with NR=20. Every time the user queries the search engine, the cookie is sent to the server along with the query. This way, the server knows how many hits should be shown in each page.