Overview:
Gerapy is a distributed crawler management framework based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django, and Vue.js. It allows users to create configurable projects, generate code for Scrapy, and manage the deployment and monitoring of jobs. Gerapy is developed using Python 3.x and can be installed via pip.
Features:
- Distributed Crawler Management: Gerapy allows users to manage and distribute crawler tasks across multiple machines.
- Configurable Projects: Users can create configurable projects and automatically generate Scrapy code.
- Deployment and Monitoring: Gerapy provides tools for deploying projects and monitoring job progress.
- Docker Support: Gerapy can be run in a Docker container for easy setup and deployment.
Installation:
To install Gerapy, use the following command:
pip install gerapy
After installation, follow these steps to run the Gerapy server:
- Initialize the workspace by running the command:
gerapy init
- Move to the created
gerapy
folder:
cd gerapy
- Initialize the database:
gerapy migrate
- Create a superuser:
gerapy createsuperuser
- Start the server:
gerapy runserver
You can now access Gerapy by visiting http://localhost:8000. The admin management backend can be accessed at http://localhost:8000/admin.
If you want to run Gerapy in public, use the following command:
gerapy runserver 0.0.0.0:8000
Summary:
Gerapy is a distributed crawler management framework that simplifies the process of creating and deploying web crawlers. It provides features such as configurable projects, automated code generation, and deployment management. With its support for Docker, Gerapy offers a convenient solution for setting up and running crawler tasks.