Vol. 19, No. 12, December 31, 2025
10.3837/tiis.2025.12.012,
Download Paper (Free):
Abstract
With the rapid development of the Internet, a vast amount of data is being generated, making various types of information easily accessible. Consequently, big data analysis, which involves collecting, storing, processing, and predicting data, has become increasingly important. Web crawlers have gained attention as tools for extracting data from specific web pages. They are utilized in various fields, including price comparison shopping, Search Engine Optimization (SEO), and Rich Site Summary (RSS) aggregation. Different types of web crawlers rely on static or dynamic crawling methods. Notable web crawlers include Scrapy, Selenium, BeautifulSoup, and Playwright, which are designed to effectively handle either static or dynamic web pages. In this paper, we focus on improving the execution performance of these crawlers by applying two tuning techniques: parallel and asynchronous processing. To evaluate their performance, we used four key metrics: Time per Image (TPI), Images per Second (IPS), CPU utilization, and memory consumption. Through controlled experiments across various web page configurations, we demonstrate how each tuning method affects the execution efficiency and system resource usage of different crawler architectures. Our findings highlight the practical trade-offs between performance and resource efficiency, providing useful insights for applying crawler optimization strategies to real-world data collection tasks.
Statistics
Show / Hide Statistics
Statistics (Cumulative Counts from December 1st, 2015)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.
Cite this article
[IEEE Style]
M. Kim and S. Jeon, "Performance Optimization of Web Crawlers via Parallel and Asynchronous Processing," KSII Transactions on Internet and Information Systems, vol. 19, no. 12, pp. 4415-4436, 2025. DOI: 10.3837/tiis.2025.12.012.
[ACM Style]
Min-Sun Kim and Sanghoon Jeon. 2025. Performance Optimization of Web Crawlers via Parallel and Asynchronous Processing. KSII Transactions on Internet and Information Systems, 19, 12, (2025), 4415-4436. DOI: 10.3837/tiis.2025.12.012.
[BibTeX Style]
@article{tiis:105405, title="Performance Optimization of Web Crawlers via Parallel and Asynchronous Processing", author="Min-Sun Kim and Sanghoon Jeon and ", journal="KSII Transactions on Internet and Information Systems", DOI={10.3837/tiis.2025.12.012}, volume={19}, number={12}, year="2025", month={December}, pages={4415-4436}}