Vol. 3 No. 2 (2023): Journal of AI-Assisted Scientific Discovery
Articles

Web Scraping Techniques for Data Collection: Studying web scraping techniques for collecting data from websites and online sources for analysis and research purposes

Dr. Juan Gómez-Olmos
Associate Professor of Computer Science, University of Jaén, Spain
Cover

Published 10-08-2023

Keywords

  • Web scraping,
  • Data collection

How to Cite

[1]
Dr. Juan Gómez-Olmos, “Web Scraping Techniques for Data Collection: Studying web scraping techniques for collecting data from websites and online sources for analysis and research purposes”, Journal of AI-Assisted Scientific Discovery, vol. 3, no. 2, pp. 170–179, Aug. 2023, Accessed: Nov. 22, 2024. [Online]. Available: https://scienceacadpress.com/index.php/jaasd/article/view/119

Abstract

Web scraping, a method of extracting information from websites, has become increasingly popular for collecting data for various purposes, including research and analysis. This paper explores the different techniques used in web scraping for data collection, focusing on their strengths, limitations, and ethical considerations. The study also discusses the challenges faced by researchers and provides recommendations for improving the efficiency and effectiveness of web scraping techniques. The findings highlight the importance of web scraping in data collection and its impact on research and analysis.

Downloads

Download data is not yet available.

References

  1. Vemoori, Vamsi. "Transformative Impact of Advanced Driver-Assistance Systems (ADAS) on Modern Mobility: Leveraging Sensor Fusion for Enhanced Perception, Decision-Making, and Cybersecurity in Autonomous Vehicles." Journal of AI-Assisted Scientific Discovery 3.2 (2023): 17-61.
  2. Ponnusamy, Sivakumar, and Dinesh Eswararaj. "Navigating the Modernization of Legacy Applications and Data: Effective Strategies and Best Practices." Asian Journal of Research in Computer Science 16.4 (2023): 239-256.
  3. Pulimamidi, Rahul. "Emerging Technological Trends for Enhancing Healthcare Access in Remote Areas." Journal of Science & Technology 2.4 (2021): 53-62.
  4. Tillu, Ravish, Muthukrishnan Muthusubramanian, and Vathsala Periyasamy. "From Data to Compliance: The Role of AI/ML in Optimizing Regulatory Reporting Processes." Journal of Knowledge Learning and Science Technology ISSN: 2959-6386 (online) 2.3 (2023): 381-391.
  5. K. Joel Prabhod, “ASSESSING THE ROLE OF MACHINE LEARNING AND COMPUTER VISION IN IMAGE PROCESSING,” International Journal of Innovative Research in Technology, vol. 8, no. 3, pp. 195–199, Aug. 2021, [Online]. Available: https://ijirt.org/Article?manuscript=152346
  6. Tatineni, Sumanth. "Applying DevOps Practices for Quality and Reliability Improvement in Cloud-Based Systems." Technix international journal for engineering research (TIJER)10.11 (2023): 374-380.
  7. Perumalsamy, Jegatheeswari, Chandrashekar Althati, and Lavanya Shanmugam. "Advanced AI and Machine Learning Techniques for Predictive Analytics in Annuity Products: Enhancing Risk Assessment and Pricing Accuracy." Journal of Artificial Intelligence Research 2.2 (2022): 51-82.
  8. Venkatasubbu, Selvakumar, Jegatheeswari Perumalsamy, and Subhan Baba Mohammed. "Machine Learning Models for Life Insurance Risk Assessment: Techniques, Applications, and Case Studies." Journal of Artificial Intelligence Research and Applications 3.2 (2023): 423-449.
  9. Mohammed, Subhan Baba, Bhavani Krothapalli, and Chandrashekar Althat. "Advanced Techniques for Storage Optimization in Resource-Constrained Systems Using AI and Machine Learning." Journal of Science & Technology 4.1 (2023): 89-125.
  10. Krothapalli, Bhavani, Lavanya Shanmugam, and Subhan Baba Mohammed. "Machine Learning Algorithms for Efficient Storage Management in Resource-Limited Systems: Techniques and Applications." Journal of Artificial Intelligence Research and Applications 3.1 (2023): 406-442.
  11. Devan, Munivel, Chandrashekar Althati, and Jegatheeswari Perumalsamy. "Real-Time Data Analytics for Fraud Detection in Investment Banking Using AI and Machine Learning: Techniques and Case Studies." Cybersecurity and Network Defense Research 3.1 (2023): 25-56.
  12. Althati, Chandrashekar, Jegatheeswari Perumalsamy, and Bhargav Kumar Konidena. "Enhancing Life Insurance Risk Models with AI: Predictive Analytics, Data Integration, and Real-World Applications." Journal of Artificial Intelligence Research and Applications 3.2 (2023): 448-486.
  13. Pakalapati, Naveen, Bhargav Kumar Konidena, and Ikram Ahamed Mohamed. "Unlocking the Power of AI/ML in DevSecOps: Strategies and Best Practices." Journal of Knowledge Learning and Science Technology ISSN: 2959-6386 (online) 2.2 (2023): 176-188.
  14. Katari, Monish, Musarath Jahan Karamthulla, and Munivel Devan. "Enhancing Data Security in Autonomous Vehicle Communication Networks." Journal of Knowledge Learning and Science Technology ISSN: 2959-6386 (online) 2.3 (2023): 496-521.
  15. Krishnamoorthy, Gowrisankar, and Sai Mani Krishna Sistla. "Exploring Machine Learning Intrusion Detection: Addressing Security and Privacy Challenges in IoT-A Comprehensive Review." Journal of Knowledge Learning and Science Technology ISSN: 2959-6386 (online) 2.2 (2023): 114-125.
  16. Reddy, Sai Ganesh, et al. "Harnessing the Power of Generative Artificial Intelligence for Dynamic Content Personalization in Customer Relationship Management Systems: A Data-Driven Framework for Optimizing Customer Engagement and Experience." Journal of AI-Assisted Scientific Discovery 3.2 (2023): 379-395.
  17. Modhugu, Venugopal Reddy, and Sivakumar Ponnusamy. "Comparative Analysis of Machine Learning Algorithms for Liver Disease Prediction: SVM, Logistic Regression, and Decision Tree." Asian Journal of Research in Computer Science 17.6 (2024): 188-201.
  18. Prabhod, Kummaragunta Joel. "Advanced Machine Learning Techniques for Predictive Maintenance in Industrial IoT: Integrating Generative AI and Deep Learning for Real-Time Monitoring." Journal of AI-Assisted Scientific Discovery 1.1 (2021): 1-29.
  19. Tatineni, Sumanth, and Karthik Allam. "Implementing AI-Enhanced Continuous Testing in DevOps Pipelines: Strategies for Automated Test Generation, Execution, and Analysis." Blockchain Technology and Distributed Systems 2.1 (2022): 46-81.
  20. Sadhu, Ashok Kumar Reddy, and Amith Kumar Reddy. "A Comparative Analysis of Lightweight Cryptographic Protocols for Enhanced Communication Security in Resource-Constrained Internet of Things (IoT) Environments." African Journal of Artificial Intelligence and Sustainable Development 2.2 (2022): 121-142.
  21. Pelluru, Karthik. "Enhancing Security and Privacy Measures in Cloud Environments." Journal of Engineering and Technology 4.2 (2022): 1-7.
  22. Makka, Arpan Khoresh Amit. “Integrating SAP Basis and Security: Enhancing Data Privacy and Communications Network Security”. Asian Journal of Multidisciplinary Research & Review, vol. 1, no. 2, Nov. 2020, pp. 131-69, https://ajmrr.org/journal/article/view/187.