Beiträge von xwe10

    Just use "ThreadPoolExecutor" with a Context Manager, see here: https://docs.python.org/3/library/concurrent.futures.html


    Quote:

    Code
    If max_workers is None or not given, it will default to the number of processors on the machine, multiplied by 5, assuming that https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor is often used to overlap I/O instead of CPU work and the number of workers should be higher than the number of workers for https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ProcessPoolExecutor.

    Is it a problem when i create threads inside a thread

    Like i have a site for example and i want to scrape multiple querys at once and create a thread for each query?


    Is the ThreadPoolExecutor better than the ThreadPool?

    Just a thought: The main difference between your laptop and your VPS is that the available host CPU cores are yours w.r.t. the former, while they're shared among all users w.r.t. the latter (and are very likely overbooked as well). I would expect a somewhat better behaviour using an RS ("root server") where the host CPU cores are allocated differently.

    I think I found the problem i'am creating to many threads of one type of site monitor.


    What is the maximum amount of threads that i schould use for i/o bounded work?

    How do you print out your statistics / mesure time?

    The default print call in python is not a thread-safe operation.


    Do you use any synchronisation mechanisms?

    I'am using the normal print function from each thread, didnt know that its not thread-safe.


    I'am also using the logger libary to create log files where I also print the statistics, do you think this could also be a problem?


    But why do it work on my Laptop and not on the Server this wouldnt make any sense no?



    Code
    import logging
    def create(name, level=logging.DEBUG):
        """Factory function to create a logger"""
    handler = logging.FileHandler("logs/"+name+".log", mode="w")        
        handler.setFormatter(logging.Formatter('%(asctime)s - %(name)s - %(message)s'))
        handler.setLevel(level)
    logger = logging.getLogger(name)
        logger.setLevel(level)
        logger.addHandler(handler)
    return logger

    Could it be that you just "fire" too quickly and the queue / backlog gets stuck at some point? Is it really necessary for the application to scrape that often? Maybe your IP runs into some external limiter of the website you are scraping from?


    If you want to use that as "sneaker sniper", a rate of 1 scrape every second or all 5 seconds should also more than enough.

    Iam requesting every 15 seconds and iam also using proxys, i dont thing the website is the problem.


    This problem occur at all websites that I monitor.

    Thank you for your help.


    Laptop:

    Docker Version: Docker version 20.10.22, build 3a2c30b

    Python Version: Python 3.10 (Same version because of the Docker Image)


    Prozessor:

    Gerätename Lenovo

    Prozessor Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz 1.90 GHz

    Installierter RAM 8,00 GB (7,88 GB verwendbar)

    Systemtyp 64-Bit-Betriebssystem, x64-basierter Prozessor


    Server:

    Docker Version: Docker version 20.10.22, build 3a2c30b

    Python Version: Python 3.10 (Same version because of the Docker Image)


    Name: VPS 2000 G10 (KVM-VServer with 8 core and 12GB memory)


    I also tried it without Docker and i still had the same problem, I dont think Docker is the problem.


    Example Site:


    Prodirectsoccer Upcoming Dunk Releases


    Iam just requesting the site and waiting for 15 seconds with (time.sleep()) to request it again to check for any upcoming releases.


    This takes 0.8-1 second on my laptop.


    Iam not doing any calculations everything is io/bounded (just waiting for stuff)



    Thank you


    No i'am located in Germany and that the server is slower than my laptop is my problem. Because this dosent make sense its the same code and if I run it on the server its way slower for no reason and the response times are inconsistenc. The server has more cores than my laptop and the internet speed is also way higher at the server.


    1: I can use threads and they speed up my software because they are waiting most of the times. (i/o bounded work)

    This can't be the problem because the software is 100% working on my laptop so the vcores of the server are handling threads differently I guess because when i used processes i didnt had these problems with the server.

    This only makes partly sense.


    What programming language / runtime environment did you use?

    What functions did you use to run your software in different processes?

    What functions / libs did you use to shift this into threads?

    I use python and I used the multiprocessing library for the processes in the past and switched to threads by using the threading library.


    The software is written by myself and is monitoring multiple websites for restocks of rare products.


    So most of the time the threads are just waiting for the website to respond. (i/o bounded threads)

    Hello Netcup Community,


    Since a recent update of my software where I switched from multiprocessing to multithreading, I have noticed a huge decrease in the performance of the software.

    On my laptop, which is much worse than the server performance wise, the software takes only 1 second for a request while on the server it consistently varies between 6-12 seconds.


    The software uses multiple I/O bounded threads and runs in Docker.


    The server is a KVM-VServer with 8 core and 12GB memory.


    Could the V-cores of the KVM server be causing the problems?


    Here is a example thread that is doing i/o bounded work (waiting for the website to respond):


    2023-03-22 12_26_22-prodirectsoccer_release.log - monitor-service - Visual Studio Code.png

    My Laptop


    2023-03-22 12_27_01-prodirectsoccer_release.log - monitor-service - Visual Studio Code.png

    The Server