Site Lab aims to be an open-source replacement for website analysis tools such as BuiltWith, NerdyData, and DataNyze.
Site Lab is a Ruby on Rails application. It uses PostgreSQL as its database and Redis + Sidekiq for background processing (background processing still in the works).
There is a live demo on Heroku at http://sitelabdemo.herokuapp.com/. Please be a good citizen wrt the demo (e.g. try it with real-world data & submit issues if you find a problem).
The demo is currently a few commits behind as I haven't had a chance to setup Redis/Sidekiq on Heroku (now used in SiteLab for background processing).
Right now, it's fairly simple:
- The MetaInspector Gem retrieves some basic info about the site/URL
- There is a "Technology" model which stores regular expressions
- Technologies are matched against the source of the sites/URLs
- Much of the processing now happens in the background (via Sidekiq)
More complex analysis is in the works.
- Current commercial SaaS products in this space are quite expensive (i.e. > $200/month). Mainly because they aim to be lead-generation services. I'm not disputing their value, I just have smaller needs and a smaller budget.
- I have some specialized needs (i.e. I want to search for specific/niche technologies), so I like the flexibility of being able to define my own parameters.
It's a Rails 4.1 app, so you'll need a dev environment that supports that (prolly RVM). You'll also need Redis installed and running (probably via Homebrew)
- Clone the repo
- Edit the database.yml file with your info
bundle installto install gems
bundle exec rake db:createto create the DB(s)
bundle exec rake db:seedto load the seed data
foreman start -p 3000to start the rails server & sidekiq locally on port 3000
While you can surely add sites/URLs one-by-one in the app, most use-cases will involve importing large sets of URLs from files or external sites. With that in mind, I've started a set of Rake tasks for importing URLs. Currently, it includes:
- Importing all startups from AngelList for a given market
- Importing all startup/product URLs listed on http://www.producthunt.com
- Importing URLs from a text file (placed in app/import)
rake -T to see the tasks and required parameters
- Grab response header to determine web server/stack (a curl -I)
- Add some background processing
- Add ElasticSearch for searching meta information
- Add auto-import of external links