Table of Contents
An URL (Uniform Resource Locator) is a unique identifier used to locate resources on the Internet. The address can also be referred to by the term web address. Typically, URLs include multiple elements, such as a protocol or domain name, that describe how and where a web browser should retrieve a resource.
In addition to typing URLs directly into browser address bars, end users are also able to click on hyperlinks found on websites, bookmark lists, in email, or from another application.
What is the structure of a URL?
A URL contains a URL name as well as the protocol name that is required to access it. The first part of a URL specifies the primary access medium that should be used. The second part indicates where the resource is based on its IP address or domain name — and perhaps its subdomain as well.
Web resources are made available using HTTP (Hypertext Transfer Protocol) or HTTPS (HTTP Secure); email addresses are made available through mailto; files are downloaded using FTP on a FTP server; and remote computers are accessed via telnet. “Mail to” is typically followed by a colon and two forward slashes, while most URLs are preceded by two forward slashes.
Besides the domain, a URL can also specify:
· a path within a domain that points to a specific page or file;
· a network port to be used to connect to the site;
· a specific reference point within a file, for example a named anchor in HTML; and
· a query or search parameter – commonly found in URLs for search results.
Efficacy of URL design
On the Internet, only ASCII characters can be used to send URLs. The URL must be converted into a valid ASCII format as URLs often contain non-ASCII characters. When URLs are encoded, unsafe ASCII characters are replaced by a “%” followed by two hexadecimal digits. A URL can never contain a space.
Parts of a URL
URLs can contain the following components:
· Protocol or scheme. Used on the Internet to access a resource. HTTP, HTTPS, FTP, mailto, and FTP are among the available protocols. Users can access the resources through the domain name system (DNS).
· Host name or domain name. A name or address which uniquely identifies a webpage.
· Port name. Does not appear in URLs but is essential. Web servers use port 80, often followed by a colon, but there are other options. For instance, :port80.
· Path. Paths refer to locations on a web server.
· Query. Found in URLs for dynamic pages. Query comprises a question mark followed by parameters.
· Parameters. This includes the information contained within a URL’s query string. Ampersands (&) are used to separate parameters.
· Fragment. The reference is to an internal page on the webpage, which refers to a particular section. It is located at the end of a URL and consists of a hashtag (#). For instance, history can be found in the URL https://en.wikipedia.org/wiki/Internet#History.
HTTP vs. HTTPS
Using HTTP and HTTPS, a web browser retrieves data from a web server to display content. Interestingly, HTTPS uses Secure Sockets Layer (SSL) certificates to encrypt the connection between the client and server, whereas HTTP uses TLS to encrypt the connection.
Security protocols, such as HTTPS, are essential for protecting sensitive information, such as card numbers, passwords, and personal data.
TCP/IP port 443 is used by default for HTTPS, whereas port 80 is used for HTTP.
URL vs. URI
The URL is the most common form of Uniform Resource Identifier (URI). URIs are strings of characters that are used for identifying resources over a network. URLs are essential when browsing the Web.
URL history
Retention of data, particularly about Web usage, has become a serious privacy issue. It has become increasingly important for search engines and application service providers to be transparent about what information they collect, retain, and sell.
For instance, Google updated its Chrome privacy policies in March 2019. According to the document, the search engine stores information locally when used in browser mode. The information includes browsing history including pages visited and content cached from those pages, such as images, text, and other resources.
In addition to collecting and retaining data, Google also utilizes various retention periods. Users have the option of deleting any data they want at any time, and Google may automatically delete some data. Google may retain other data if necessary for longer periods of time.