News:

Printed Amstrad Addict magazine announced, check it out here!

Main Menu
avatar_Gryzor

Need some linux help with the server

Started by Gryzor, 10:21, 25 April 25

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Gryzor

Hello everyone!

As you may have noticed for quite a while now the server has been occasionally slow and even not responding. The reason is, there has been (again) an influx of bots hitting the site.

I updated the blocking rules for Asian countries*, but until we see if this resolves the issue I'd like to investigate some more.

So here's the help I need, if there are any Linux experts in here:

I would like to monitor the incoming connections for a little while. That's easy to do, in a multitude of ways. BUT, I would like to export those connections in a easily readable format to parse in Excel and sort, so I can identify IP ranges. And I haven't found a solution to this.

So if anyone knows of a solution, pretty please, step forward :)

Thanks!


*I seem to remember there were a couple of persons having issues with this, so please get in touch

ajcasado

Hi @ikonsgr,
I'm a linux  user, but not an expert. Anyway I asked chatgpt and obtained the following response, I hope it fits your needs:


You can use
ss or
netstat in combination with a loop and some filtering to log incoming connections, then export to CSV for Excel. Here's a simple example using ss:
ss -ntu state established | awk '{print $5}' | cut -d':' -f1 | sort | uniq -c | sort -nr > connections.txt

This gives you a count of IPs. If you want a CSV format:
ss -ntu state established | awk '{print $5}' | cut -d':' -f1 | sort | uniq -c | awk '{print $2","$1}' > connections.csv

That outputs  IP,count — ready to open in Excel.
For continuous monitoring, wrap it in a loop or use
tcpdump with
-tttt and export the output for parsing.
Let me know if you need a more persistent/logging setup.

CPC 664

Empiezas a envejecer cuando dejas de aprender.
You start to get old when you stop learning.

Gryzor

#2
I'm not ikonsgr :D but thanks for your trouble, really appreciate it! It hadn't occurred to me to use AI for that.

I think netstat has been deprecated, though you can use it in a hackity way to get results. ss works quite better, tcpdump too if you can simplify the output. I tried your solutions, I got some interesting results but it needs some work to get exactly what I need. But it works.

I also tried asking Perplexity and it gave me this nice, simple script which I modified which also works fine (I didn't ask for unique IPs you may notice):

#!/bin/bash
output_file="connection_ips.txt"
echo "IP Address,Timestamp" > $output_file
monitoring_time=120 # 2 minutes

end=$((SECONDS+monitoring_time))
while [ $SECONDS -lt $end ]; do
    netstat -tn | grep ':80 ' | awk '{print $5}' | cut -d: -f1 | while read ip; do
        echo "$ip,$(date +"%Y-%m-%d %H:%M:%S")" >> $output_file
    done
    sleep 1
done

echo "Monitoring complete. Results saved to $output_file"

Bryce

You could add a log rule to your iptables firewall config. That would log the connections in the Kernel log file. You can then filter them out to analyse them.

Bryce.

Gryzor

Quote from: Bryce on 11:03, 25 April 25You could add a log rule to your iptables firewall config. That would log the connections in the Kernel log file. You can then filter them out to analyse them.

Bryce.
Yeah, I've been resisting using iptables until I realize there's no simpler/better way😁

ajcasado

Quote from: Gryzor on 10:59, 25 April 25I'm not ikonsgr :D
Oops, my brain must've lagged harder than the server, apologies chief! :picard2: , anyway, glad I could help!

I haven't tried Perplexity yet, but the script it generated looks solid, definitely worth a shot.

CPC 664

Empiezas a envejecer cuando dejas de aprender.
You start to get old when you stop learning.

Gryzor

Quote from: ajcasado on 11:08, 25 April 25
Quote from: Gryzor on 10:59, 25 April 25I'm not ikonsgr :D
Oops, my brain must've lagged harder than the server, apologies chief! :picard2: , anyway, glad I could help!

I haven't tried Perplexity yet, but the script it generated looks solid, definitely worth a shot.


I think the original script had an error (didn't try it, I edited it before running, maybe it was my idea and it was fine).

Perplexity is pretty nice actually,I found a pro subscription on the cheap and I use it for some research, really loving it. Their own llm is pretty good (though it offers others, too).

The only drawback compared to ChatGTP is it doesn't have long-term memory (I think only ChatGTP does?).

lmimmfn

#7
You can use:
lsof -i

It will show in columnar format the incomming connections to the server which can be imported to excel, heres a sample output:
COMMAND      PID    USER  FD  TYPE    DEVICE SIZE/OFF NODE NAME
node      667384 lmimmfn  63u  IPv4 197438208      0t0  TCP localhost:40022->localhost:27017 (ESTABLISHED)
node      667384 lmimmfn  64u  IPv4 197438209      0t0  TCP localhost:40036->localhost:27017 (ESTABLISHED)
node      667384 lmimmfn  65u  IPv4 197438210      0t0  TCP localhost:40048->localhost:27017 (ESTABLISHED)
node      667384 lmimmfn  66u  IPv4 197438211      0t0  TCP localhost:40052->localhost:27017 (ESTABLISHED)
node      667384 lmimmfn  67u  IPv4 197438230      0t0  TCP localhost:49218->localhost:6379 (ESTABLISHED)
node      667384 lmimmfn  68u  IPv4 197438231      0t0  TCP localhost:49232->localhost:6379 (ESTABLISHED)
node      667384 lmimmfn  69u  IPv4 197438232      0t0  TCP localhost:49236->localhost:6379 (ESTABLISHED)
node      667384 lmimmfn  70u  IPv6 197438874      0t0  TCP *:8585 (LISTEN)
node      667384 lmimmfn  71u  IPv4 197438875      0t0  TCP localhost:45638->localhost:50051 (ESTABLISHED)
node      667384 lmimmfn  72u  IPv4 197438879      0t0  TCP localhost:48272->localhost:27017 (ESTABLISHED)
node      667384 lmimmfn  73u  IPv4 197438884      0t0  TCP localhost:48282->localhost:27017 (ESTABLISHED)
node      667384 lmimmfn  74u  IPv6 197441725      0t0  TCP lmimmfn-dev:8585->100.120.44.137:52915 (ESTABLISHED)
node      667384 lmimmfn  76u  IPv6 197441727      0t0  TCP lmimmfn-dev:8585->100.120.44.137:52916 (ESTABLISHED)
node      667384 lmimmfn  77u  IPv6 197441728      0t0  TCP lmimmfn-dev:8585->100.120.44.137:52917 (ESTABLISHED)
node      667384 lmimmfn  78u  IPv6 197441730      0t0  TCP lmimmfn-dev:8585->100.120.44.137:52919 (ESTABLISHED)
node      667384 lmimmfn  79u  IPv6 197441729      0t0  TCP lmimmfn-dev:8585->100.120.44.137:52918 (ESTABLISHED)
node      667384 lmimmfn  80u  IPv6 197441731      0t0  TCP lmimmfn-dev:8585->100.120.44.137:52920 (ESTABLISHED)
node      667384 lmimmfn  81u  IPv6 197441732      0t0  TCP lmimmfn-dev:8585->100.120.44.137:52933 (ESTABLISHED)

Here i have web server ruinning on port 8585 and i have sessions open from 100.120.44.137, so you probably have the server running on port 443 so you can just pipe to grep for just those connections:
lsof -i | grep ":443->"

If you want just the connections you can | it to awk as connections are 9th column:
lsof -i | awk '{print $9}'

If you want just the list of connected incomming ip's you can use:
lsof -i | awk '{print $9}' | grep ":443->" | sed 's/.*->//'

Heres sample output filtering from the above example and on port 8585
lsof -i  | awk '{print $9}' | grep ":8585->" | sed 's/.*->//'
100.120.44.137:52915
100.120.44.137:52916
100.120.44.137:52917
100.120.44.137:52919
100.120.44.137:52918
100.120.44.137:52920
100.120.44.137:52933

You could crontab the above(running say every 5 minutes) to concatanate(use >> instead for > for concatenation) to a file e.g.(and dont use /tmp for your server, put it somewhere else) /tmp/ip_addresses.log
lsof -i  | awk '{print $9}' | grep ":443->" | sed 's/.*->//' >/tmp/ip_addresses.log

to add to crontab:
crontab -e
will open the crontab file for editing, add the line:
*/5 * * * * lsof -i  | awk '{print $9}' | grep ":443->" | sed 's/.*->//' >/tmp/ip_addresses.log
save it( escape then ":" then "wq"

Then to get the unique ones just :
cat /tmp/ip_addresses.log | sort | uniq

you can also aggregate the count of connections from the same ipaddress with:
cat /tmp/ip_addresses.log | sort | uniq -c


6128 for the win!!!

Gryzor

I *think* lsof was the first I tried, but the output format was not readable if directed to a file. But will try it again to double check, thanks! 

genesis8

I am using a debian 12 VM on OVH.

For just stats of Apache visitors, I am using Awstats

I am also using Fail2ban with the following personal rules to ban :

- too many 404 errors (trying to use known web pages of wordpress by example)
- SSH access not in a few authorized IP ranges (even if only one user is authorized for SSH access)
- ignoring error from my IP ranges, because I already got banned from my own rules :-)

Access to phpmyadmin is limited in Apache configuration with the same IP ranges authorized (my IP provider from home and for the phone)
____________
Amstrad news site at Genesis8 Amstrad Page

lmimmfn

@genisis8 - i dont think its a good idea to post the types of security rules of your public server even if they are higher level, better to just PM the info.
6128 for the win!!!

Gryzor

Soooo in the end I didn't find a way to do it easily and in a nice format...

So I made a script to use tcpdump on the relevant ports, then parsed the data with geoiplookup and then sort them in an output file. Funnily enough I wasted the most time wrestling with grep to parse the data dump :D

Soooo, bad news I'm afraid:

=== Connection Counts by Country ===
  30351 United States
  28362 Canada
    658 Germany
    598 Greece
    450 Belgium
    380 United Kingdom
    366 Spain
    366 France
    344 Unknown
    342 Denmark
    340 Australia
    336 Ireland
    303 Brazil
    283 Czech Republic
    273 Russian Federation
    166 Saudi Arabia
    145 Romania
     55 Austria
     54 Sweden
     51 Croatia

Those are the top countries according to incoming connections over the course of five minutes (not sure about the absolute numbers; I think that loading a single page results in multiple connections?).

The top two, by far, are USA and Canada. Which would mean that, maybe, these are stupid bots belonging to some big services - perhaps some LLM crawlers? And, what do we do now? Can't quite block the entire US and Canada? Maybe blocking IPs, but that's risky, damn it...

eto

What user agents do you see there? 

Gryzor

I don't. I only parse the dump for IPs and right now I'm trying to set up a whois but I think I may be getting throttled by the whois server... 

But let me try and see how the user agent is represented in the dump and amend the script.

Gryzor

By the way, I resolved a few IPs and they're Alibaba...

Gryzor

Ok, pulled user agents, not much help...

=== Connection Counts by User Agent ===
    26 okhttp/4.12.0
      4 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36 Vivaldi/5.3.2679.68
      2 yacybot (/global; amd64 Linux 6.8.0-51-generic; java 21.0.7; Etc/en) http://yacy.net/bot.html
      2 WPMU DEV Broken Link Checker Local Engine
      2 SiteUptime.com
      2 Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/84.0.4147.105 Safari/537.36
      2 Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:138.0) Gecko/20100101 Firefox/138.0
      2 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
      2 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36
      2 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36
      1 Mozilla/5.0+(compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)
      1 Mozilla/5.0 (compatible; theoldreader.com; support@theoldreader.com; 1 subscribers; feed-id=0092bf7a4794e7be1b91edce)
      1 Mozilla/5.0 (compatible; BLEXBot/1.0; +https://help.seranking.com/en/blex-crawler)
      1 Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.66 Safari/537.36
      1 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.66 Safari/537.36
      1 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.3928.815 Safari/537.36
      1 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.1183.871 Safari/537.36




The way I did it:

tshark -r "$TMP_FILE" -Y 'http.request' -T fields -e http.user_agent 2>/dev/null | \
    grep -v '^$' >> "$UA_LIST"

Am I doing something wrong? The numbers don't line up...


redbox

You could try blocking the Alibaba IP ranges and see if anyone complains (redirect to a wait page as this usually makes the bots loose interest).

It's a nightmare whack-a-mole scenario to do it yourself though.  Cloudflare as a reverse proxy is much easier and free for hobby websites.

Gryzor

#17
Quote from: redbox on 13:29, 21 May 25You could try blocking the Alibaba IP ranges and see if anyone complains (redirect to a wait page as this usually makes the bots loose interest).

It's a nightmare whack-a-mole scenario to do it yourself though.  Cloudflare as a reverse proxy is much easier and free for hobby websites.
I had found this page, wishing it had a complete list of IPs to easily block, so I dropped it... but reading your message again right now I had an idea and I asked ChatGPT to parse the pages and create a blocklist :D Here's the result:

<RequireAll>
    Require all granted
    Require not ip 47.74.0.0/15
    Require not ip 47.235.0.0/16
    Require not ip 47.250.0.0/16
    Require not ip 47.88.0.0/14
    Require not ip 47.56.0.0/15
    Require not ip 155.102.0.0/16
    Require not ip 163.181.0.0/16
    Require not ip 47.52.0.0/16
    Require not ip 147.139.0.0/16
    Require not ip 139.95.0.0/16
    Require not ip 72.254.0.0/16
    Require not ip 61.200.84.0/24
    Require not ip 47.89.91.0/24
    Require not ip 47.89.112.0/24
    Require not ip 111.108.151.176/28
</RequireAll>


Can't check if indeed it included them all, but I'm going to try it later on the server...

Regarding Cloudflare: I only use it as a reverse proxy for some containers of mine, wonder how easy it would be to incorporate it here... in that case, does the standard non-human checks apply? Haven't really worked with it much.

Powered by SMFPacks Menu Editor Mod