Problem with Multithreading on an KVM vServer

  • Hello Netcup Community,


    Since a recent update of my software where I switched from multiprocessing to multithreading, I have noticed a huge decrease in the performance of the software.

    On my laptop, which is much worse than the server performance wise, the software takes only 1 second for a request while on the server it consistently varies between 6-12 seconds.


    The software uses multiple I/O bounded threads and runs in Docker.


    The server is a KVM-VServer with 8 core and 12GB memory.


    Could the V-cores of the KVM server be causing the problems?


    Here is a example thread that is doing i/o bounded work (waiting for the website to respond):


    2023-03-22 12_26_22-prodirectsoccer_release.log - monitor-service - Visual Studio Code.png

    My Laptop


    2023-03-22 12_27_01-prodirectsoccer_release.log - monitor-service - Visual Studio Code.png

    The Server

  • You changed what exactly? From exec's to POSIX threads or what can I imagine here?


    Should not.


    I switched from running parts of the software in multiple processes to running them in multiple threads to improve the memory consumption because the threads are more lightweight.

  • I switched from running parts of the software in multiple processes to running them in multiple threads to improve the memory consumption because the threads are more lightweight.

    This only makes partly sense.


    What programming language / runtime environment did you use?

    What functions did you use to run your software in different processes?

    What functions / libs did you use to shift this into threads?

  • This only makes partly sense.


    What programming language / runtime environment did you use?

    What functions did you use to run your software in different processes?

    What functions / libs did you use to shift this into threads?

    I use python and I used the multiprocessing library for the processes in the past and switched to threads by using the threading library.


    The software is written by myself and is monitoring multiple websites for restocks of rare products.


    So most of the time the threads are just waiting for the website to respond. (i/o bounded threads)

  • Thanks.


    Some educated guesses:


    1st: your implementation of threads just uses one core - therefore doesn't speed up the process at all.

    More on this can be found here: https://towardsdatascience.com…ltithreading-9b62f9875a27


    The threads are just a way of handle blocked resources, and not parallel execution.


    2nd: Why is the server slower than your laptop?

    You told me, that the programm is fetching information on the internet.

    The server is located in Germany, your laptop probably isn't, and so your websites aren't located in Germany - which adds delay.


    In which country did you test this with your laptop and do you have a sample web page that you use?

  • Thank you


    No i'am located in Germany and that the server is slower than my laptop is my problem. Because this dosent make sense its the same code and if I run it on the server its way slower for no reason and the response times are inconsistenc. The server has more cores than my laptop and the internet speed is also way higher at the server.


    1: I can use threads and they speed up my software because they are waiting most of the times. (i/o bounded work)

    This can't be the problem because the software is 100% working on my laptop so the vcores of the server are handling threads differently I guess because when i used processes i didnt had these problems with the server.

  • This can't be the problem because the software is 100% working on my laptop so the vcores of the server are handling threads differently I guess because when i used processes i didnt had these problems with the server.

    Between the vCores and the Python Threads are so many layers, e.g. the Linux kernel vs. Windows on your system I guess and Python itself and Docker etc. pp.


    Trust me on this.

    From your answer I take that you lack the understanding of how multithreading in python scales with more cores.


    I need some more information, if you are willing to work this thru:


    A version and machine dump.

    What docker and python versions are you using on your Laptop and Server.

    What kind of server do you have?

    What's the processor in your Laptop?

    What is a sample site?

    Do you do any calculations on the responses, any database involvement etc.

  • Thank you for your help.


    Laptop:

    Docker Version: Docker version 20.10.22, build 3a2c30b

    Python Version: Python 3.10 (Same version because of the Docker Image)


    Prozessor:

    Gerätename Lenovo

    Prozessor Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz 1.90 GHz

    Installierter RAM 8,00 GB (7,88 GB verwendbar)

    Systemtyp 64-Bit-Betriebssystem, x64-basierter Prozessor


    Server:

    Docker Version: Docker version 20.10.22, build 3a2c30b

    Python Version: Python 3.10 (Same version because of the Docker Image)


    Name: VPS 2000 G10 (KVM-VServer with 8 core and 12GB memory)


    I also tried it without Docker and i still had the same problem, I dont think Docker is the problem.


    Example Site:


    Prodirectsoccer Upcoming Dunk Releases


    Iam just requesting the site and waiting for 15 seconds with (time.sleep()) to request it again to check for any upcoming releases.


    This takes 0.8-1 second on my laptop.


    Iam not doing any calculations everything is io/bounded (just waiting for stuff)



  • Iam just requesting the site and waiting for 15 seconds with (time.sleep()) to request it again to check for any upcoming releases.

    How do you print out your statistics / mesure time?

    The default print call in python is not a thread-safe operation.


    Do you use any synchronisation mechanisms?

  • Could it be that you just "fire" too quickly and the queue / backlog gets stuck at some point? Is it really necessary for the application to scrape that often? Maybe your IP runs into some external limiter of the website you are scraping from?


    If you want to use that as "sneaker sniper", a rate of 1 scrape every second or all 5 seconds should also more than enough.

    RS Ostern L OST22 (~RS "3000" G9.5) (8C,24GB,960GB) | RS Cyber Quack (1C,2GB,40GB)

    Edited 3 times, last by TBT ().

  • Could it be that you just "fire" too quickly and the queue / backlog gets stuck at some point? Is it really necessary for the application to scrape that often? Maybe your IP runs into some external limiter of the website you are scraping from?


    If you want to use that as "sneaker sniper", a rate of 1 scrape every second or all 5 seconds should also more than enough.

    Iam requesting every 15 seconds and iam also using proxys, i dont thing the website is the problem.


    This problem occur at all websites that I monitor.

  • How do you print out your statistics / mesure time?

    The default print call in python is not a thread-safe operation.


    Do you use any synchronisation mechanisms?

    I'am using the normal print function from each thread, didnt know that its not thread-safe.


    I'am also using the logger libary to create log files where I also print the statistics, do you think this could also be a problem?


    But why do it work on my Laptop and not on the Server this wouldnt make any sense no?



    Code
    import logging
    def create(name, level=logging.DEBUG):
        """Factory function to create a logger"""
    handler = logging.FileHandler("logs/"+name+".log", mode="w")        
        handler.setFormatter(logging.Formatter('%(asctime)s - %(name)s - %(message)s'))
        handler.setLevel(level)
    logger = logging.getLogger(name)
        logger.setLevel(level)
        logger.addHandler(handler)
    return logger
  • Name: VPS 2000 G10 (KVM-VServer with 8 core and 12GB memory)

    Just a thought: The main difference between your laptop and your VPS is that the available host CPU cores are yours w.r.t. the former, while they're shared among all users w.r.t. the latter (and are very likely overbooked as well). I would expect a somewhat better behaviour using an RS ("root server") where the host CPU cores are allocated differently.

    VServer IOPS Comparison Sheet: https://docs.google.com/spreadsheets/d/1w38zM0Bwbd4VdDCQoi1buo2I-zpwg8e0wVzFGSPh3iE/edit?usp=sharing

    Like 2
  • Just a thought: The main difference between your laptop and your VPS is that the available host CPU cores are yours w.r.t. the former, while they're shared among all users w.r.t. the latter (and are very likely overbooked as well). I would expect a somewhat better behaviour using an RS ("root server") where the host CPU cores are allocated differently.

    I think I found the problem i'am creating to many threads of one type of site monitor.


    What is the maximum amount of threads that i schould use for i/o bounded work?

  • Just use "ThreadPoolExecutor" with a Context Manager, see here: https://docs.python.org/3/library/concurrent.futures.html


    Quote:

    Code
    If max_workers is None or not given, it will default to the number of processors on the machine, multiplied by 5, assuming that https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor is often used to overlap I/O instead of CPU work and the number of workers should be higher than the number of workers for https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ProcessPoolExecutor.
  • Just use "ThreadPoolExecutor" with a Context Manager, see here: https://docs.python.org/3/library/concurrent.futures.html


    Quote:

    Code
    If max_workers is None or not given, it will default to the number of processors on the machine, multiplied by 5, assuming that https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor is often used to overlap I/O instead of CPU work and the number of workers should be higher than the number of workers for https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ProcessPoolExecutor.

    Is it a problem when i create threads inside a thread

    Like i have a site for example and i want to scrape multiple querys at once and create a thread for each query?


    Is the ThreadPoolExecutor better than the ThreadPool?

  • Another approach / suggestion:

    Try, if the http(s) - Keyword monitor of https://github.com/louislam/uptime-kuma could be something for you.


    It has a very nice GUI and tons of options, including the setup of a Proxy. And brings a huuuge number of notification options, including Telegram and all (Business) Messengers or push notification providers I could think of. It also offers a nice dashboard and an inversion option.

    RS Ostern L OST22 (~RS "3000" G9.5) (8C,24GB,960GB) | RS Cyber Quack (1C,2GB,40GB)

    Edited 2 times, last by TBT ().

    Like 2