PlacementBatches Classroomnotes 12/Apr/2022

Web Servers

Introduction

  • Websites are these things that exist on the internet, where people go to find stuff (technical description). Most of the web runs on Linux, with segmented and darker corners for Windows et al.
  • It’s typical for a lot of people to start their Linux careers working for either an ISP or a web host of some sort, meaning that a lot of newcomers to the field get thrown into the deep end of having to manage very public websites immediately. This isn’t a bad thing, as you tend to learn quickly in environments with a variety of issues, and when you’re surrounded by a host of other people who are all experiencing the same frustration day in, day out, it can be quite the learning experience.
  • There are many different components to the web, and though the heyday of static HTML sites has been and gone
  • Starting at the easiest, we’re going to look at actual web servers (that serve web content), databases (that hold web content), and TLSs (that encrypt web content in transit).
  • We’re also going to look at some other pieces of technology that you’ll probably come across at some point (again, definitely if you work for a hosting provider). These are:

    (e)mail transfer agents (such as Postfix and Exim)
    NoSQL databases (such as MongoDB)
    fast key value (KV) stores (such as Redis)
    message brokers (such as RabbitMQ)

  • Don’t let any of these scare you—they’ve just words on a page.

Installing and understanding a web server

  • A web server is the component you’re interacting with directly when you go to a website. It traditionally listens on port 80 (for Hypertext Transfer Protocol (HTTP)) or 443 (for Hypertext Transfer Protocol Secure (HTTPS)).
  • When you type a URL into your browser, these ports are generally hidden unless explicitly defined; for example, hitting https://directdevops.blog in Chrome or Google Chrome will load the website, but it won’t tell you that it’s connecting on port 443. In a similar fashion, if you go to https://directdevops.blog:443, the exact same page should load.
  • Also, if you try to go to port 80 using HTTPS (https://directdevops.blog:80/), you will generally get an error saying the site can’t provide a secure connection
  • This is because you tried to talk to an insecure port (80) using a secure protocol (HTTPS).
  • Web servers literally serve the web, but they’re usually just the frontend to other technology.
  • Blog posts on a WordPress install, for example, might be stored in a database behind the scenes, while they’re presented to the end-user by the web server frontend.
  • It’s the job of the web server to determine how to display the content of a page to the requesting client.

Installing httpd (Apache) on RedHat

  • As the title suggests, Redhat re-badges the Apache HTTP Server as httpd, I suspect to genericise the product for ease of understanding
  • Install httpd like so:
sudo yum install httpd -y
  • Now let’s start it, since this is RedHat:
sudo systemctl enable --now httpd

Installing Nginx on Ubuntu

  • let’s install Nginx instead:
sudo apt update
sudo apt install nginx -y

How it works…

  • What we’ve done here is install two different web servers, though they accomplish the same thing.
  • Arguably, there’s no better and simultaneously worse place for standards compliance than the web, which means that, regardless of the web server you choose to use (Apache, Nginx), you should still be able to serve content in a consistent fashion.
  • The first server we installed was Apache, which for years was the “go to” web server and the one that is still considered “battle hardened” by a lot of the more traditional administrators out there.
  • At the time of writing, Apache is still seen as the bigger player, but Nginx has been rising to fame in recent years, and looks set to take over (more on this later).
  • We then installed Nginx on our Ubuntu box (though Apache is available too). Debian’s claim to fame of having thousands of packages available comes to the fore here, as it also has a slew of different web servers you might like to try (I only chose Apache and Nginx as the two biggest).
  • Regardless of which one you install, both of these systems are now more than capable of serving static HTTP content to the internet (or at least your little slice of your network, as it’s not publicly accessible).
  • If we look at the ss output on our Ubuntu box, we see the following:
$ ss -tuna
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port 
udp UNCONN 0 0 *:68 *:* 
tcp LISTEN 0 128 *:80 *:* 
tcp LISTEN 0 128 *:22 *:* 
tcp ESTAB 0 0 10.0.2.15:22 10.0.2.2:40136 
tcp ESTAB 0 0 127.0.0.1:56490 127.0.0.1:80 
tcp ESTAB 0 0 127.0.0.1:80 127.0.0.1:56490 
tcp LISTEN 0 128 :::80 :::* 
tcp LISTEN 0 128 :::22 :::* 
  • We can see port 80, listening on all available IPs, and we can see the established communication, which is actually coming from our forwarded web connection and Google Chrome. It’s the exact same story on the CentOS box.
  • All of this is great, and it means that when our client (Google Chrome in this example) requests content from the web server (Apache), that server is able to deliver the requested content in a fashion and style that the client can understand.
  • I mentioned other web servers, and it’s true that there’s quite a few.
  • In OpenBSD land, you’ll probably find yourself installing httpd, which isn’t a re-badged Apache (as is the case on CentOS), but is actually completely different software, that just happens to share the same name, and perform similar functions…
  • Or, you might like the idea of Tomcat, which is less of a traditional web server, as it acts as a frontend to Java servlets (usually some sort of web application).
  • There’s lighttpd too, which is (as the name might suggest) supposed to be a lightweight web server, without the many bells and whistles of functionality that Nginx or Apache provide.
  • In the Windows world (a horrible place that I don’t like to visit), you get IIS, which is more of a suite of internet services that’s available on a Windows server.

Basic Apache configuration

  • We’ve installed httpd on our CentOS machine, meaning that we’ve got a web server running on port 80 and we’re able to hit it from our Google Chrome installation on our host machine.
  • In this section, we’re going to take a look at how our server knows what to display and what we can do to set up a site of our own so that people aren’t greeted by the default Apache page when they visit our IP.
  • First, we should have a quick look at where the default configuration is being loaded from. On the default page, we can see the following section:
  • First, we can ls the directory listed in this message to see what’s there already:
ls /var/www/html/
  • There’s nothing… odd.
  • Let’s put a basic index.html page in this directory, just to see what happens:
cat <<HERE | sudo tee -a /var/www/html/index.html
WELCOME TO WORLD OF LEARNING
HERE
  • Now let’s visit our website once more:
  • Navigate to the public ip http://<ipaddress&gt; and it should be changed
  • OK, so clearly this directory is being used for something, but it doesn’t explain where the configuration on what to display lives.
  • Let’s cat the suggested welcome file:
cat /etc/httpd/conf.d/welcome.conf 

# 
# This configuration file enables the default "Welcome" page if there
# is no default index page present for the root URL. To disable the
# Welcome page, comment out all the lines below. 
#
# NOTE: if this file is removed, it will be restored on upgrades.
#
<LocationMatch "^/+$">
    Options -Indexes
    ErrorDocument 403 /.noindex.html
</LocationMatch>

<Directory /usr/share/httpd/noindex>
    AllowOverride None
    Require all granted
</Directory>

Alias /.noindex.html /usr/share/httpd/noindex/index.html
Alias /noindex/css/bootstrap.min.css /usr/share/httpd/noindex/css/bootstrap.min.css
Alias /noindex/css/open-sans.css /usr/share/httpd/noindex/css/open-sans.css
Alias /images/apache_pb.gif /usr/share/httpd/noindex/images/apache_pb.gif
Alias /images/poweredby.png /usr/share/httpd/noindex/images/poweredby.png
  • The important takeaway is as follows:
    • This configuration file enables the default “Welcome” page if there is no default index page present for the root URL.”
  • First, because you can have a large amount of different websites on one web server (virtual hosts), let’s create a small amount of segregation within our folder structure to keep different website files separate:
sudo mkdir /var/www/linux-learning
sudo mv /var/www/html/index.html /var/www/linux-learning/
  • Next, add the configuration that’s required for this directory to be read:
cat <<HERE | sudo tee -a /etc/httpd/conf.d/linux-learning.conf
<VirtualHost 127.0.0.1:80>
    ServerAdmin thebestlearner@world.com
    DocumentRoot "/var/www/linux-learning/"
    ServerName 127.0.0.1
    ServerAlias 127.0.0.1
</VirtualHost>
HERE
  • Then, we need to reload the configuration:
sudo systemctl reload httpd
  • With this in place, while nothing on the surface has changed, it means that you can add more websites
  • The reason we were able to drop a file into /var/www/html/ and view it in our browser was because of the DocumentRoot setting within the main Apache configuration file, which can be seen here:
cat /etc/httpd/conf/httpd.conf | grep ^DocumentRoot
DocumentRoot "/var/www/html"
  • The reason we used index.html as the filename, aside from it being convention, was because of the following line:
cat /etc/httpd/conf/httpd.conf | grep "^ DirectoryIndex"
    DirectoryIndex index.html
  • This dictates which file to load when a directory is requested.
  • While the /etc/httpd/conf/httpd.conf file is the default configuration file, we’re also able to add additional configuration for websites under the /etc/httpd/conf.d/ directory, as we did in this case.
  • We used a very specific stanza for our own configuration, as shown here:
<VirtualHost 127.0.0.1:80>
    ServerAdmin thebestlearner@world.com
    DocumentRoot "/var/www/linux-learning/"
    ServerName 127.0.0.1
    ServerAlias 127.0.0.1
</VirtualHost>
  • This stanza meant that while we could continue to host the same content as we did previously on our site, we’re also able to host other content too, with different DocumentRoots.
  • When we visited our site a second time, instead of being directed to /var/www/html as the DocumentRoot, we were instead pointed to /var/www/linux-learning/ because the preceding configuration dictated as such
  • We also have a ServerName and a ServerAlias directive, though in this case the Alias does nothing.
  • ServerName is the domain or IP address that the end user typed into their browser. The alias can be others associated with that name.
  • For example, you could have the following:
ServerName example.com
ServerAlias www.example.com fish.example.com europe.example.com
  • All of these would hit the same DocumentRoot.
  • Virtual hosts only really come into their own when you have multiple domain names pointing at a server. In practice, you can have hundreds of different domain names pointing to one box, but because Apache is aware of the domain you’re using to connect, it will only serve the exact site you’ve requested.
  • In multi-tenant situations, it’s not uncommon for multiple clients to coexist on one server, only manipulating and updating their own files, oblivious to the fact they’re sharing a box with other companies and users.
  • In testing environments, you tend to see multiple websites on one box at once, because they’re usually lightweight and several can run in parallel. This presents a problem for testing domain name resolution though, as it can get costly and time-consuming to use public domain name services for test and temporary websites.
  • One solution to this problem is to use the /etc/hosts file (on Linux and Unix systems) instead.
$ cat /etc/hosts
127.0.0.1 centos1 centos1
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
  • You could add an additional line to this file, as follows:
192.168.33.11 mysuperlinux.com
# ipaddress of your server
  • Now when you go to mysupersite.com in your browser, the name will be resolved to the IP address you specified, instead of going out to an external DNS server for name resolution.
  • In this way, you can have multiple “virtual hosts” on your Apache web server, and because your browser is requesting named sites (even if they’re all on the same IP address), you will get different content depending on the name you connected with.

Basic Nginx configuration

  • Heading on to our Ubuntu server now, we’re going to have a look at the default Nginx page that we can see when we visit http://<ubuntu-server-public-ip&gt;, and we’re going to replace this text with our own message.
  • Nginx, as we stated previously, is growing in popularity. It has become the go-to web server because of its ease-of-use and flexibility when required—not that this is a marketing pitch; they’re both open source and free.
  • Our default Nginx page doesn’t have any pointers on where to look for configuration changes, only pointing you to the official documentation (which is well worth a peruse) and a commercial support offering.
  • This default page actually lives in a very similar location to the one we’ve just been examining on RedHat:
$ cat /var/www/html/index.nginx-debian.html 
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
  • Note that this file is called index.nginx-debian.html and that it’s the only file in /var/www/html to begin with.
  • Like Apache, Nginx has a concept of virtual hosts, which we’re going to configure in /etc/nginx/conf.d/.
  • Let’s start by creating some content:
sudo mkdir /var/www/linuxlearning
cat <<HERE | sudo tee -a /var/www/linuxlearning/index.html
How come I'm in one book, then I just disappear?
HERE

  • Now we can add to our chosen virtual hosts directory:
$ cat <<HERE | sudo tee /etc/nginx/conf.d/linuxlearning.conf
server {
listen 80;
listen [::]:80;

root /var/www/linuxlearning;
index index.html;

server_name 127.0.0.1;

location / {
try_files \$uri \$uri/ =404;
}
}
HERE
  • Then, we need to load Nginx:
sudo systemctl reload nginx
  • Now we should be able to see our question in our browser when pointing to the forwarded port we set up:
  • Our default Nginx configuration file is located at /etc/nginx/nginx.conf and it sets things like the process ID location, along with the user that Nginx will run as ( www-data here), on this Debian installation:
head /etc/nginx/nginx.conf
user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;

events {
  worker_connections 768;
  # multi_accept on;
}
  • Within this file, there also exists the following block of configuration
##
# Virtual Host Configs
##

include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;
  • Note that the top directory is the one we chose to use for our configuration.
  • When we placed the linuxlearning.conf configuration in the /etc/nginx/conf.d/ directory, we were instructing Nginx to load this configuration, along with everything else it loads at launch.
  • Let’s look at our configuration:
server {
listen 80;
listen [::]:80;

root /var/www/linuxlearning;
index index.html;

server_name 127.0.0.1;

location / {
try_files $uri $uri/ =404;
}
}
  • The listen directives are fairly straightforward, but if you had multiple IP addresses on a box, they might be expanded to include a specific entry.
  • Next, our root entry is the root location of website files. Here, it’s set to the one we chose to create for our great question.
  • index is the name of the file to load when Nginx enters the directory. The standard index.html is used here.
  • And, like Apache, server_name is the domain name or IP address that the end user is hoping to receive content for. It could be a string of names, as seen elsewhere:
server_name linuxlearning.com learning.linux.com;
  • Lastly, the try_files line within the location block means that files of a given link will be searched for, and if they’re not found, a 404 will be given instead.
  • You can test this by trying to go to a non-existent file in your browser, for example, http://<ubuntu-publicip>/devops
  • However, we could change the 404 to a 403 and reload the Nginx config:
$ sudo sed -i 's/404/403/g' /etc/nginx/conf.d/linuxlearning.conf 
$ sudo systemctl reload nginx
  • You might be wondering about using systemctl reload, and why I chose to use that instead of restart.
  • The answer should become clearer when we cat the systemd unit file for Nginx:
systemctl cat nginx | grep Reload
ExecReload=/usr/sbin/nginx -g 'daemon on; master_process on;' -s reload
  • There’s a specific ExecReload line that runs with the -s reload flag. This sends the reload signal to Nginx (SIGHUP); that is, it’s less disruptive to the running process.
  • In Debian and Debian-like distributions, the concept of a sites-enabled and sites-available directory has become commonplace.
  • Theoretically, any sites you have on your box could go in the sites-available directory, and once you’re happy with them, you create a symlink to the sites-enabled directory.
  • Next Step: Setting up SSL, TLS

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

About continuous learner

devops & cloud enthusiastic learner