Initial commit

This commit is contained in:
TheTechRobo 2022-05-17 21:36:37 -04:00
commit b5b36df745
14 changed files with 4325 additions and 0 deletions

9
.gitignore vendored Normal file
View File

@ -0,0 +1,9 @@
*~
*.pyc
wget-lua
wget-at
STOP
BANNED
data/
test/
duplicate-urls.txt

3
Dockerfile Normal file
View File

@ -0,0 +1,3 @@
FROM atdr.meo.ws/archiveteam/grab-base
COPY . /grab
RUN ln -fs /usr/local/bin/wget-lua /grab/wget-at

1053
JSON.lua Normal file

File diff suppressed because it is too large Load Diff

24
LICENSE Normal file
View File

@ -0,0 +1,24 @@
This is free and unencumbered software released into the public domain.
Anyone is free to copy, modify, publish, use, compile, sell, or
distribute this software, either in source code form or as a compiled
binary, for any purpose, commercial or non-commercial, and by any
means.
In jurisdictions that recognize copyright laws, the author or authors
of this software dedicate any and all copyright interest in the
software to the public domain. We make this dedication for the benefit
of the public at large and to the detriment of our heirs and
successors. We intend this dedication to be an overt act of
relinquishment in perpetuity of all present and future rights to this
software under copyright law.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
For more information, please refer to <http://unlicense.org>

184
README.md Normal file
View File

@ -0,0 +1,184 @@
urls-grab
=============
More information about the archiving project can be found on the ArchiveTeam wiki: [URLs](http://archiveteam.org/index.php?title=URLs)
Setup instructions
=========================
Be sure to replace `YOURNICKHERE` with the nickname that you want to be shown as, on the tracker. You don't need to register it, just pick a nickname you like.
In most of the below cases, there will be a web interface running at http://localhost:8001/. If you don't know or care what this is, you can just ignore it—otherwise, it gives you a fancy view of what's going on.
**If anything goes wrong while running the commands below, please scroll down to the bottom of this page. There's troubleshooting information there.**
Running with a warrior
-------------------------
Follow the [instructions on the ArchiveTeam wiki](http://archiveteam.org/index.php?title=Warrior) for installing the Warrior, and select the "URLs" project in the Warrior interface.
Running without a warrior
-------------------------
To run this outside the warrior, clone this repository, cd into its directory and run:
python3 -m pip install setuptools wheel
python3 -m pip install --upgrade seesaw zstandard requests
./get-wget-lua.sh
then start downloading with:
run-pipeline3 pipeline.py --concurrent 2 YOURNICKHERE
For more options, run:
run-pipeline3 --help
If you don't have root access and/or your version of pip is very old, you can replace "pip install --upgrade seesaw" with:
wget https://raw.github.com/pypa/pip/master/contrib/get-pip.py ; python3 get-pip.py --user ; ~/.local/bin/pip3 install --upgrade --user seesaw
so that pip and seesaw are installed in your home, then run
~/.local/bin/run-pipeline3 pipeline.py --concurrent 2 YOURNICKHERE
Running multiple instances on different IPs
-------------------------------------------
This feature requires seesaw version 0.0.16 or greater. Use `pip install --upgrade seesaw` to upgrade.
Use the `--context-value` argument to pass in `bind_address=123.4.5.6` (replace the IP address with your own).
Example of running 2 threads, no web interface, and Wget binding of IP address:
run-pipeline3 pipeline.py --concurrent 2 YOURNICKHERE --disable-web-server --context-value bind_address=123.4.5.6
Distribution-specific setup
-------------------------
### For Debian/Ubuntu:
Package `libzstd-dev` version 1.4.4 is required which is currently available from `buster-backports`.
adduser --system --group --shell /bin/bash archiveteam
echo deb http://deb.debian.org/debian buster-backports main contrib > /etc/apt/sources.list.d/backports.list
apt-get update \
&& apt-get install -y git-core libgnutls-dev lua5.1 liblua5.1-0 liblua5.1-0-dev screen bzip2 zlib1g-dev flex autoconf autopoint texinfo gperf lua-socket rsync automake pkg-config python3-dev python3-pip build-essential \
&& apt-get -t buster-backports install zstd libzstd-dev libzstd1
python3 -m pip install setuptools wheel
python3 -m pip install --upgrade seesaw zstandard requests
su -c "cd /home/archiveteam; git clone https://github.com/ArchiveTeam/urls-grab.git; cd urls-grab; ./get-wget-lua.sh" archiveteam
screen su -c "cd /home/archiveteam/urls-grab/; run-pipeline3 pipeline.py --concurrent 2 --address '127.0.0.1' YOURNICKHERE" archiveteam
[... ctrl+A D to detach ...]
In __Debian Jessie, Ubuntu 18.04 Bionic and above__, the `libgnutls-dev` package was renamed to `libgnutls28-dev`. So, you need to do the following instead:
adduser --system --group --shell /bin/bash archiveteam
echo deb http://deb.debian.org/debian buster-backports main contrib > /etc/apt/sources.list.d/backports.list
apt-get update \
&& apt-get install -y git-core libgnutls28-dev lua5.1 liblua5.1-0 liblua5.1-0-dev screen bzip2 zlib1g-dev flex autoconf autopoint texinfo gperf lua-socket rsync automake pkg-config python3-dev python3-pip build-essential \
&& apt-get -t buster-backports install zstd libzstd-dev libzstd1
[... pretty much the same as above ...]
Wget-lua is also available on [ArchiveTeam's PPA](https://launchpad.net/~archiveteam/+archive/wget-lua) for Ubuntu.
### For CentOS:
Ensure that you have the CentOS equivalent of bzip2 installed as well. You will need the EPEL repository to be enabled.
yum -y groupinstall "Development Tools"
yum -y install gnutls-devel lua-devel python-pip zlib-devel zstd libzstd-devel git-core gperf lua-socket luarocks texinfo git rsync gettext-devel
pip install --upgrade seesaw
[... pretty much the same as above ...]
Tested with EL7 repositories.
### For Fedora:
The same as CentOS but with "dnf" instead of "yum". Did not successfully test compiling, so far.
### For openSUSE:
zypper install liblua5_1 lua51 lua51-devel screen python-pip libgnutls-devel bzip2 python-devel gcc make
pip install --upgrade seesaw
[... pretty much the same as above ...]
### For OS X:
You need Homebrew. Ensure that you have the OS X equivalent of bzip2 installed as well.
brew install python lua gnutls
pip install --upgrade seesaw
[... pretty much the same as above ...]
**There is a known issue with some packaged versions of rsync. If you get errors during the upload stage, urls-grab will not work with your rsync version.**
This supposedly fixes it:
alias rsync=/usr/local/bin/rsync
### For Arch Linux:
Ensure that you have the Arch equivalent of bzip2 installed as well.
1. Make sure you have `python2-pip` installed.
2. Install [the wget-lua package from the AUR](https://aur.archlinux.org/packages/wget-lua/).
3. Run `pip2 install --upgrade seesaw`.
4. Modify the run-pipeline script in seesaw to point at `#!/usr/bin/python2` instead of `#!/usr/bin/python`.
5. `useradd --system --group users --shell /bin/bash --create-home archiveteam`
6. `screen su -c "cd /home/archiveteam/urls-grab/; run-pipeline pipeline.py --concurrent 2 --address '127.0.0.1' YOURNICKHERE" archiveteam`
### For Alpine Linux:
apk add lua5.1 git python bzip2 bash rsync gcc libc-dev lua5.1-dev zlib-dev gnutls-dev autoconf flex make
python -m ensurepip
pip install -U seesaw
git clone https://github.com/ArchiveTeam/urls-grab
cd urls-grab; ./get-wget-lua.sh
run-pipeline pipeline.py --concurrent 2 --address '127.0.0.1' YOURNICKHERE
### For FreeBSD:
Honestly, I have no idea. `./get-wget-lua.sh` supposedly doesn't work due to differences in the `tar` that ships with FreeBSD. Another problem is the apparent absence of Lua 5.1 development headers. If you figure this out, please do let us know on IRC (irc.efnet.org #archiveteam).
Troubleshooting
=========================
Broken? These are some of the possible solutions:
### wget-lua was not successfully built
If you get errors about `wget.pod` or something similar, the documentation failed to compile - wget-lua, however, compiled fine. Try this:
cd get-wget-lua.tmp
mv src/wget ../wget-lua
cd ..
The `get-wget-lua.tmp` name may be inaccurate. If you have a folder with a similar but different name, use that instead and please let us know on IRC what folder name you had!
Optionally, if you know what you're doing, you may want to use wgetpod.patch.
### Problem with gnutls or openssl during get-wget-lua
Please ensure that gnutls-dev(el) and openssl-dev(el) are installed.
### ImportError: No module named seesaw
If you're sure that you followed the steps to install `seesaw`, permissions on your module directory may be set incorrectly. Try the following:
chmod o+rX -R /usr/local/lib/python2.7/dist-packages
### run-pipeline: command not found
Install `seesaw` using `pip2` instead of `pip`.
pip2 install seesaw
### Issues in the code
If you notice a bug and want to file a bug report, please use the GitHub issues tracker.
Are you a developer? Help write code for us! Look at our [developer documentation](http://archiveteam.org/index.php?title=Dev) for details.
### Other problems
Have an issue not listed here? Join us on IRC and ask! We can be found at hackint IRC [#//](https://webirc.hackint.org/#irc://irc.hackint.org/#//).

64
bad-params.txt Normal file
View File

@ -0,0 +1,64 @@
utm_source
utm_medium
utm_campaign
utm_term
utm_content
utm_adgroup
ref
refsrc
referrer_id
referrerid
src
i
s
ts
feature
jsessionid
phpsessid
aspsessionid
sessionid
zenid
sid
gclid
fb_xd_fragment
fb_comment_id
fbclid
cfid
cftoken
doing_wp_cron
pk_cpn
pk_campaign
pk_kwd
pk_keyword
piwik_campaign
piwik_kwd
ga_source
ga_medium
ga_term
ga_content
ga_campaign
ga_place
yclid
_openstat
fb_action_ids
fb_action_types
fb_source
fb_ref
action_object_map
action_type_map
action_ref_map
gs_l
mkt_tok
hmb_campaign
hmb_medium
hmb_source
rand
wicket:antiCache
cachebuster
nocache
vs
dilid
script_case_session
cid
extid
_flowexecutionkey

33
bad-patterns.txt Normal file
View File

@ -0,0 +1,33 @@
/action/consumeSharedSessionAction
/action/consumeSsoCookie
/action/getSharedSiteSession
/juris/error%.jsf
facebook%.com/login%.php
facebook%.com/cookie/
facebook%.com/plugins/
facebook%.com/sharer/
facebook%.com/sharer%.php
gongquiz%.com.+&historyNo=[0-9]+
univis%.univie%.ac%.at/ausschreibungstellensuche/
fundraise%.cancerresearchuk%.org/signup/account/
mma%.ft%.com
^https?://dmg%.go%-2b%-planer%.de/
^https?://3d%.espace%-aubade%.fr/
^https?://kuechenplaner%.[^/]+/cloud/
^https?://3d%-salledebains%.geberit%.fr/
^https?://bibliotekanauki%.ceon%.pl/yadda/search/general%.action
^https?://[^/]+%.icm%.edu%.pl/.*search/article%.action
^https?://interamt%.de/koop/app/
^https?://tesiunam%.dgb%.unam%.mx/F/
^https?://[^%.]+%.sedelectronica%.es/.*%?x=
^https?://www%.cp%-cc%.org/programs%-services/
/ibank/_crypt_
%%7B%%7B.+%%7D%%7D
^https?://[^/]+/&quot;
^http://[0-9a-z][0-9a-z][0-9a-z][0-9][0-9][0-9]?%.[^%./]+%.com/$
^http://[0-9a-z][0-9a-z][0-9a-z][0-9][0-9][0-9]?%.[^%./]+%.com/[a-z]+%.?[a-z][a-z][a-z]?$
^http://[0-9a-z][0-9a-z][0-9a-z][0-9][0-9][0-9]?%.[^%./]+%.com/[a-z]+/[a-z]+[0-9]*%.?[a-z][a-z][a-z]?$
^https?://[^/]*yahoo%.com/.+%%5C.+at%.atwola%.com
^https?://[^/]*at%.atwola%.com/
^https?://www%.bafa%.de/
%%5C%%22

File diff suppressed because it is too large Load Diff

57
get-wget-lua.sh Executable file
View File

@ -0,0 +1,57 @@
#!/usr/bin/env bash
#
# This script clones and compiles wget-lua.
#
# first, try to detect gnutls or openssl
CONFIGURE_SSL_OPT=""
if builtin type -p pkg-config &>/dev/null
then
if pkg-config gnutls
then
echo "Compiling wget with GnuTLS."
CONFIGURE_SSL_OPT="--with-ssl=gnutls"
elif pkg-config openssl
then
echo "Compiling wget with OpenSSL."
CONFIGURE_SSL_OPT="--with-ssl=openssl"
fi
fi
if ! zstd --version | grep -q 1.4.4
then
echo "Need version 1.4.4 of libzstd-dev and zstd"
exit 1
fi
rm -rf get-wget-lua.tmp/
mkdir -p get-wget-lua.tmp
cd get-wget-lua.tmp
git clone https://github.com/archiveteam/wget-lua.git
cd wget-lua
git checkout v1.20.3-at
#echo -n 1.20.3-at-lua | tee ./.version ./.tarball-version > /dev/null
if ./bootstrap && ./configure $CONFIGURE_SSL_OPT --disable-nls && make && src/wget -V | grep -q lua
then
cp src/wget ../../wget-at
cd ../../
echo
echo
echo "###################################################################"
echo
echo "wget-lua successfully built."
echo
./wget-at --help | grep -iE "gnu|warc|lua"
rm -rf get-wget-lua.tmp
exit 0
else
echo
echo "wget-lua not successfully built."
echo
exit 1
fi

21
ignore-patterns.txt Normal file
View File

@ -0,0 +1,21 @@
[%?&]ver=[0-9a-zA-Z%.]*%.16[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]
[%?&]ver=16[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]
[%?&]t=16[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]$
[%?&]t=16[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]%.[0-9]+$
[%?&]hash=16[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]$
%?16[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]$
%?16[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]$
%?6[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]$
%?v=[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]$
;extid=[0-9a-f]+$
[%?&;]_flowexecutionkey=
[%?&;]sid=
[%?&;]cid=
[%?&;]jsessionid=
[%?&;]script_case_session=
[%?&;]Dilid=
[%?&;][pP][hH][pP][sS][eE][sS][sS][iI][dD]=
[%?&;]wtd=
[%?&;]nonce=
[%?&;]rnd=
^https?://[^/]+/index%.php%?s=

View File

@ -0,0 +1,17 @@
%.apng
%.avif
%.gif
%.jpe?g
%.jfif
%.pjpeg
%.pjp
%.png
%.svg
%.webp
%.bmp
%.ico
%.cur
%.tif
%.tiff
%.js
%.css

425
pipeline.py Normal file
View File

@ -0,0 +1,425 @@
# encoding=utf8
import datetime
from distutils.version import StrictVersion
import hashlib
import json
import os
import random
import shutil
import socket
import subprocess
import sys
import threading
import time
import string
import sys
if sys.version_info[0] < 3:
from urllib import unquote
from urlparser import parse_qs
else:
from urllib.parse import unquote, parse_qs
import requests
import seesaw
from seesaw.config import realize, NumberConfigValue
from seesaw.externalprocess import WgetDownload
from seesaw.item import ItemInterpolation, ItemValue
from seesaw.pipeline import Pipeline
from seesaw.project import Project
from seesaw.task import SimpleTask, LimitConcurrent
from seesaw.tracker import GetItemFromTracker, PrepareStatsForTracker, \
UploadWithTracker, SendDoneToTracker
from seesaw.util import find_executable
import zstandard
if StrictVersion(seesaw.__version__) < StrictVersion('0.8.5'):
raise Exception('This pipeline needs seesaw version 0.8.5 or higher.')
LOCK = threading.Lock()
###########################################################################
# Find a useful Wget+Lua executable.
#
# WGET_AT will be set to the first path that
# 1. does not crash with --version, and
# 2. prints the required version string
WGET_AT = find_executable(
'Wget+AT',
[
'GNU Wget 1.20.3-at.20211001.01'
],
[
'./wget-at',
'/home/warrior/data/wget-at'
]
)
if not WGET_AT:
raise Exception('No usable Wget+At found.')
###########################################################################
# The version number of this pipeline definition.
#
# Update this each time you make a non-cosmetic change.
# It will be added to the WARC files and reported to the tracker.
VERSION = '20220423.01'
#USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.183 Safari/537.36'
TRACKER_ID = 'urls'
TRACKER_HOST = 'legacy-api.arpa.li'
MULTI_ITEM_SIZE = 40
MAX_DUPES_LIST_SIZE = 10000
###########################################################################
# This section defines project-specific tasks.
#
# Simple tasks (tasks that do not need any concurrency) are based on the
# SimpleTask class and have a process(item) method that is called for
# each item.
class CheckIP(SimpleTask):
def __init__(self):
SimpleTask.__init__(self, 'CheckIP')
self._counter = 0
def process(self, item):
# NEW for 2014! Check if we are behind firewall/proxy
if self._counter <= 0:
item.log_output('Checking IP address.')
ip_set = set()
ip_set.add(socket.gethostbyname('twitter.com'))
#ip_set.add(socket.gethostbyname('facebook.com'))
ip_set.add(socket.gethostbyname('youtube.com'))
ip_set.add(socket.gethostbyname('microsoft.com'))
ip_set.add(socket.gethostbyname('icanhas.cheezburger.com'))
ip_set.add(socket.gethostbyname('archiveteam.org'))
if len(ip_set) != 5:
item.log_output('Got IP addresses: {0}'.format(ip_set))
item.log_output(
'Are you behind a firewall/proxy? That is a big no-no!')
raise Exception(
'Are you behind a firewall/proxy? That is a big no-no!')
# Check only occasionally
if self._counter <= 0:
self._counter = 10
else:
self._counter -= 1
class CheckRequirements(SimpleTask):
def __init__(self):
SimpleTask.__init__(self, 'CheckRequirements')
self._checked = False
def process(self, item):
if not self._checked:
assert shutil.which('pdftohtml') is not None
self._checked = True
class PrepareDirectories(SimpleTask):
def __init__(self, warc_prefix):
SimpleTask.__init__(self, 'PrepareDirectories')
self.warc_prefix = warc_prefix
def process(self, item):
item_name = item['item_name']
item_name_hash = hashlib.sha1(item_name.encode('utf8')).hexdigest()
escaped_item_name = item_name_hash
dirname = '/'.join((item['data_dir'], escaped_item_name))
if os.path.isdir(dirname):
shutil.rmtree(dirname)
os.makedirs(dirname)
item['item_dir'] = dirname
item['warc_file_base'] = '-'.join([
self.warc_prefix,
item_name_hash,
time.strftime('%Y%m%d-%H%M%S')
])
if not os.path.isfile('duplicate-urls.txt'):
open('duplicate-urls.txt', 'w').close()
open('%(item_dir)s/%(warc_file_base)s.warc.zst' % item, 'w').close()
open('%(item_dir)s/%(warc_file_base)s_bad-urls.txt' % item, 'w').close()
open('%(item_dir)s/%(warc_file_base)s_duplicate-urls.txt' % item, 'w').close()
class MoveFiles(SimpleTask):
def __init__(self):
SimpleTask.__init__(self, 'MoveFiles')
def process(self, item):
os.rename('%(item_dir)s/%(warc_file_base)s.warc.zst' % item,
'%(data_dir)s/%(warc_file_base)s.%(dict_project)s.%(dict_id)s.warc.zst' % item)
shutil.rmtree('%(item_dir)s' % item)
class SetBadUrls(SimpleTask):
def __init__(self):
SimpleTask.__init__(self, 'SetBadUrls')
def unquote_url(self, url):
temp = unquote(url)
while url != temp:
url = temp
temp = unquote(url)
return url
def process(self, item):
item['item_name_original'] = item['item_name']
items = item['item_name'].split('\0')
items_lower = [self.unquote_url(url).strip().lower() for url in item['item_urls']]
with open('%(item_dir)s/%(warc_file_base)s_bad-urls.txt' % item, 'r') as f:
for url in {
self.unquote_url(url).strip().lower() for url in f
}:
index = items_lower.index(url)
items.pop(index)
items_lower.pop(index)
item['item_name'] = '\0'.join(items)
class SetDuplicateUrls(SimpleTask):
def __init__(self):
SimpleTask.__init__(self, 'SetNewDuplicates')
def process(self, item):
with LOCK:
self._process(item)
def _process(self, item):
with open('duplicate-urls.txt', 'r') as f:
duplicates = {s.strip() for s in f}
with open('%(item_dir)s/%(warc_file_base)s_duplicate-urls.txt' % item, 'r') as f:
for url in f:
duplicates.add(url.strip())
with open('duplicate-urls.txt', 'w') as f:
# choose randomly, to cycle periodically popular URLs
duplicates = list(duplicates)
random.shuffle(duplicates)
f.write('\n'.join(duplicates[:MAX_DUPES_LIST_SIZE]))
class MaybeSendDoneToTracker(SendDoneToTracker):
def enqueue(self, item):
if len(item['item_name']) == 0:
return self.complete_item(item)
return super(MaybeSendDoneToTracker, self).enqueue(item)
def get_hash(filename):
with open(filename, 'rb') as in_file:
return hashlib.sha1(in_file.read()).hexdigest()
CWD = os.getcwd()
PIPELINE_SHA1 = get_hash(os.path.join(CWD, 'pipeline.py'))
LUA_SHA1 = get_hash(os.path.join(CWD, 'urls.lua'))
def stats_id_function(item):
d = {
'pipeline_hash': PIPELINE_SHA1,
'lua_hash': LUA_SHA1,
'python_version': sys.version,
}
return d
class ZstdDict(object):
created = 0
data = None
@classmethod
def get_dict(cls):
if cls.data is not None and time.time() - cls.created < 1800:
return cls.data
response = requests.get(
'https://legacy-api.arpa.li/dictionary',
params={
'project': TRACKER_ID
}
)
response.raise_for_status()
response = response.json()
if cls.data is not None and response['id'] == cls.data['id']:
cls.created = time.time()
return cls.data
print('Downloading latest dictionary.')
response_dict = requests.get(response['url'])
response_dict.raise_for_status()
raw_data = response_dict.content
if hashlib.sha256(raw_data).hexdigest() != response['sha256']:
raise ValueError('Hash of downloaded dictionary does not match.')
if raw_data[:4] == b'\x28\xB5\x2F\xFD':
raw_data = zstandard.ZstdDecompressor().decompress(raw_data)
cls.data = {
'id': response['id'],
'dict': raw_data
}
cls.created = time.time()
return cls.data
class WgetArgs(object):
def realize(self, item):
with open('user-agents.txt', 'r') as f:
USER_AGENT = random.choice(list(f)).strip()
wget_args = [
'timeout', '1000',
WGET_AT,
'-U', USER_AGENT,
'-v',
'--content-on-error',
'--lua-script', 'urls.lua',
'-o', ItemInterpolation('%(item_dir)s/wget.log'),
#'--no-check-certificate',
'--output-document', ItemInterpolation('%(item_dir)s/wget.tmp'),
'--truncate-output',
'-e', 'robots=off',
'--rotate-dns',
'--recursive', '--level=inf',
'--no-parent',
'--timeout', '10',
'--tries', '2',
'--span-hosts',
'--page-requisites',
'--waitretry', '0',
'--warc-file', ItemInterpolation('%(item_dir)s/%(warc_file_base)s'),
'--warc-header', 'operator: Archive Team',
'--warc-header', 'x-wget-at-project-version: ' + VERSION,
'--warc-header', 'x-wget-at-project-name: ' + TRACKER_ID,
'--warc-dedup-url-agnostic',
'--warc-compression-use-zstd',
'--warc-zstd-dict-no-include',
'--header', 'Connection: keep-alive',
'--header', 'Accept-Language: en-US;q=0.9, en;q=0.8'
]
dict_data = ZstdDict.get_dict()
with open(os.path.join(item['item_dir'], 'zstdict'), 'wb') as f:
f.write(dict_data['dict'])
item['dict_id'] = dict_data['id']
item['dict_project'] = TRACKER_ID
wget_args.extend([
'--warc-zstd-dict', ItemInterpolation('%(item_dir)s/zstdict'),
])
item['item_name'] = '\0'.join([
item_name for item_name in item['item_name'].split('\0')
if (item_name.startswith('custom:') and '&url=' in item_name) \
or item_name.startswith('http://') \
or item_name.startswith('https://') \
])
item['item_name_newline'] = item['item_name'].replace('\0', '\n')
item_urls = []
custom_items = {}
for item_name in item['item_name'].split('\0'):
wget_args.extend(['--warc-header', 'x-wget-at-project-item-name: '+item_name])
wget_args.append('item-name://'+item_name)
if item_name.startswith('custom:'):
data = parse_qs(item_name.split(':', 1)[1])
for k, v in data.items():
if len(v) == 1:
data[k] = v[0]
url = data['url']
custom_items[url.lower()] = data
else:
url = item_name
item_urls.append(url)
wget_args.append(url)
item['item_urls'] = item_urls
item['custom_items'] = json.dumps(custom_items)
if 'bind_address' in globals():
wget_args.extend(['--bind-address', globals()['bind_address']])
print('')
print('*** Wget will bind address at {0} ***'.format(
globals()['bind_address']))
print('')
return realize(wget_args, item)
###########################################################################
# Initialize the project.
#
# This will be shown in the warrior management panel. The logo should not
# be too big. The deadline is optional.
project = Project(
title = 'URLs',
project_html = '''
<img class="project-logo" alt="logo" src="https://archiveteam.org/images/thumb/f/f3/Archive_team.png/235px-Archive_team.png" height="50px"/>
<h2>Archiving sets of discovered outlinks. &middot; <a href="http://tracker.archiveteam.org/urls/">Leaderboard</a></span></h2>
'''
)
pipeline = Pipeline(
CheckIP(),
CheckRequirements(),
GetItemFromTracker('https://{}/{}/multi={}/'
.format(TRACKER_HOST, TRACKER_ID, MULTI_ITEM_SIZE),
downloader, VERSION),
PrepareDirectories(warc_prefix='urls'),
WgetDownload(
WgetArgs(),
max_tries=1,
accept_on_exit_code=[0, 4, 8],
env={
'item_dir': ItemValue('item_dir'),
'item_name': ItemValue('item_name_newline'),
'custom_items': ItemValue('custom_items'),
'warc_file_base': ItemValue('warc_file_base')
}
),
SetBadUrls(),
SetDuplicateUrls(),
PrepareStatsForTracker(
defaults={'downloader': downloader, 'version': VERSION},
file_groups={
'data': [
ItemInterpolation('%(item_dir)s/%(warc_file_base)s.warc.zst')
]
},
id_function=stats_id_function,
),
MoveFiles(),
LimitConcurrent(NumberConfigValue(min=1, max=20, default='2',
name='shared:rsync_threads', title='Rsync threads',
description='The maximum number of concurrent uploads.'),
UploadWithTracker(
'https://%s/%s' % (TRACKER_HOST, TRACKER_ID),
downloader=downloader,
version=VERSION,
files=[
ItemInterpolation('%(data_dir)s/%(warc_file_base)s.%(dict_project)s.%(dict_id)s.warc.zst')
],
rsync_target_source_path=ItemInterpolation('%(data_dir)s/'),
rsync_extra_args=[
'--recursive',
'--partial',
'--partial-dir', '.rsync-tmp',
'--min-size', '1',
'--no-compress',
'--compress-level', '0'
]
),
),
MaybeSendDoneToTracker(
tracker_url='https://%s/%s' % (TRACKER_HOST, TRACKER_ID),
stats=ItemValue('stats')
)
)

942
urls.lua Normal file
View File

@ -0,0 +1,942 @@
local urlparse = require("socket.url")
local http = require("socket.http")
JSON = (loadfile "JSON.lua")()
local item_dir = os.getenv("item_dir")
local item_name = os.getenv("item_name")
local custom_items = os.getenv("custom_items")
local warc_file_base = os.getenv("warc_file_base")
local url_count = 0
local downloaded = {}
local abortgrab = false
local exit_url = false
local min_dedup_mb = 5
local timestamp = nil
if urlparse == nil or http == nil then
io.stdout:write("socket not corrently installed.\n")
io.stdout:flush()
abortgrab = true
end
local urls = {}
for url in string.gmatch(item_name, "([^\n]+)") do
urls[string.lower(url)] = true
end
local urls_settings = JSON:decode(custom_items)
for k, _ in pairs(urls_settings) do
urls[string.lower(k)] = true
end
local status_code = nil
local redirect_urls = {}
local visited_urls = {}
local ids_to_ignore = {}
for _, lengths in pairs({{8, 4, 4, 4, 12}, {8, 4, 4, 12}}) do
local uuid = ""
for _, i in pairs(lengths) do
for j=1,i do
uuid = uuid .. "[0-9a-fA-F]"
end
if i ~= 12 then
uuid = uuid .. "%-"
end
end
ids_to_ignore[uuid] = true
end
local to_ignore = ""
for i=1,9 do
to_ignore = to_ignore .. "[0-9]"
end
ids_to_ignore["%?" .. to_ignore .. "$"] = true
ids_to_ignore["%?" .. to_ignore .. "[0-9]$"] = true
ids_to_ignore[to_ignore .. "[0-9]%.[0-9][0-9][0-9][0-9]$"] = true
to_ignore = ""
for i=1,50 do
to_ignore = to_ignore .. "[0-9a-zA-Z]"
end
ids_to_ignore[to_ignore .. "%-[0-9][0-9][0-9][0-9][0-9]"] = true
ids_to_ignore["[0-9a-zA-Z%-_]!%-?[0-9]"] = true
to_ignore = ""
for i=1,32 do
to_ignore = to_ignore .. "[0-9a-fA-F]"
end
ids_to_ignore["[^0-9a-fA-F]" .. to_ignore .. "[^0-9a-fA-F]"] = true
ids_to_ignore["[^0-9a-fA-F]" .. to_ignore .. "$"] = true
local current_url = nil
local current_settings = nil
local bad_urls = {}
local queued_urls = {}
local bad_params = {}
local bad_patterns = {}
local ignore_patterns = {}
local page_requisite_patterns = {}
local duplicate_urls = {}
local extract_outlinks_patterns = {}
local item_first_url = nil
local redirect_domains = {}
local checked_domains = {}
local parenturl_uuid = nil
local parenturl_requisite = nil
local dupes_file = io.open("duplicate-urls.txt", "r")
for url in dupes_file:lines() do
duplicate_urls[url] = true
end
dupes_file:close()
local bad_params_file = io.open("bad-params.txt", "r")
for param in bad_params_file:lines() do
local param = string.gsub(
param, "([a-zA-Z])",
function(c)
return "[" .. string.lower(c) .. string.upper(c) .. "]"
end
)
table.insert(bad_params, param)
end
bad_params_file:close()
local bad_patterns_file = io.open("bad-patterns.txt", "r")
for pattern in bad_patterns_file:lines() do
table.insert(bad_patterns, pattern)
end
bad_patterns_file:close()
local ignore_patterns_file = io.open("ignore-patterns.txt", "r")
for pattern in ignore_patterns_file:lines() do
table.insert(ignore_patterns, pattern)
end
ignore_patterns_file:close()
local page_requisite_patterns_file = io.open("page-requisite-patterns.txt", "r")
for pattern in page_requisite_patterns_file:lines() do
table.insert(page_requisite_patterns, pattern)
end
page_requisite_patterns_file:close()
local extract_outlinks_patterns_file = io.open("extract-outlinks-patterns.txt", "r")
for pattern in extract_outlinks_patterns_file:lines() do
extract_outlinks_patterns[pattern] = true
end
extract_outlinks_patterns_file:close()
read_file = function(file, bytes)
if not bytes then
bytes = "*all"
end
if file then
local f = assert(io.open(file))
local data = f:read(bytes)
f:close()
if not data then
data = ""
end
return data
else
return ""
end
end
table_length = function(t)
local count = 0
for _ in pairs(t) do
count = count + 1
end
return count
end
check_domain_outlinks = function(url, target)
local parent = string.match(url, "^https?://([^/]+)")
while parent do
if (not target and extract_outlinks_patterns[parent])
or (target and parent == target) then
return parent
end
parent = string.match(parent, "^[^%.]+%.(.+)$")
end
return false
end
bad_code = function(status_code)
return status_code ~= 200
and status_code ~= 301
and status_code ~= 302
and status_code ~= 303
and status_code ~= 307
and status_code ~= 308
and status_code ~= 404
and status_code ~= 410
end
find_path_loop = function(url, max_repetitions)
local tested = {}
for s in string.gmatch(urlparse.unescape(url), "([^/]+)") do
s = string.lower(s)
if not tested[s] then
if s == "" then
tested[s] = -2
else
tested[s] = 0
end
end
tested[s] = tested[s] + 1
if tested[s] == max_repetitions then
return true
end
end
return false
end
percent_encode_url = function(url)
temp = ""
for c in string.gmatch(url, "(.)") do
local b = string.byte(c)
if b < 32 or b > 126 then
c = string.format("%%%02X", b)
end
temp = temp .. c
end
return temp
end
queue_url = function(url, withcustom)
if not url then
return nil
end
queue_new_urls(url)
if not string.match(url, "^https?://[^/]+%.") then
return nil
end
--local original = url
load_setting_depth = function(s)
n = tonumber(current_settings[s])
if n == nil then
n = 0
end
return n - 1
end
url = string.gsub(url, "'%s*%+%s*'", "")
url = percent_encode_url(url)
url = string.match(url, "^([^{]+)")
url = string.match(url, "^([^<]+)")
url = string.match(url, "^([^\\]+)")
if current_settings and current_settings["all"] and withcustom then
local depth = load_setting_depth("depth")
local keep_random = load_setting_depth("keep_random")
local keep_all = load_setting_depth("keep_all")
local any_domain = load_setting_depth("any_domain")
if depth >= 0 then
local random = current_settings["random"]
local all = current_settings["all"]
if keep_random < 0 or random == "" then
random = nil
keep_random = nil
end
if keep_all < 0 or all == 0 then
all = nil
keep_all = nil
end
if any_domain <= 0 then
any_domain = nil
end
local settings = {
depth=depth,
all=all,
keep_all=keep_all,
random=random,
keep_random=keep_random,
url=url,
any_domain=any_domain
}
url = "custom:"
for _, k in pairs(
{"all", "any_domain", "depth", "keep_all", "keep_random", "random", "url"}
) do
local v = settings[k]
if v ~= nil then
url = url .. k .. "=" .. urlparse.escape(tostring(v)) .. "&"
end
end
url = string.sub(url, 1, -2)
end
end
if not duplicate_urls[url] and not queued_urls[url] then
if find_path_loop(url, 2) then
return false
end
--print("queuing",original, url)
queued_urls[url] = true
end
end
queue_monthly_url = function(url)
local random_s = os.date("%Y%m", timestamp)
url = percent_encode_url(url)
queued_urls["custom:random=" .. random_s .. "&url=" .. urlparse.escape(tostring(url))] = true
end
remove_param = function(url, param_pattern)
local newurl = url
repeat
url = newurl
newurl = string.gsub(url, "([%?&;])" .. param_pattern .. "=[^%?&;]*[%?&;]?", "%1")
until newurl == url
return string.match(newurl, "^(.-)[%?&;]?$")
end
queue_new_urls = function(url)
if not url then
return nil
end
local newurl = string.gsub(url, "([%?&;])[aA][mM][pP];", "%1")
if url == current_url then
if newurl ~= url then
queue_url(newurl)
end
end
for _, param_pattern in pairs(bad_params) do
newurl = remove_param(newurl, param_pattern)
end
if newurl ~= url then
queue_url(newurl)
end
newurl = string.match(newurl, "^([^%?&]+)")
if newurl ~= url then
queue_url(newurl)
end
url = string.gsub(url, "&quot;", '"')
url = string.gsub(url, "&amp;", "&")
for newurl in string.gmatch(url, '([^"\\]+)') do
if newurl ~= url then
queue_url(newurl)
end
end
end
report_bad_url = function(url)
if current_url ~= nil then
bad_urls[current_url] = true
else
bad_urls[string.lower(url)] = true
end
end
strip_url = function(url)
url = string.match(url, "^https?://(.+)$")
newurl = string.match(url, "^www%.(.+)$")
if newurl then
url = newurl
end
return url
end
wget.callbacks.download_child_p = function(urlpos, parent, depth, start_url_parsed, iri, verdict, reason)
local url = urlpos["url"]["url"]
local parenturl = parent["url"]
local extract_page_requisites = false
local current_settings_all = current_settings and current_settings["all"]
local current_settings_any_domain = current_settings and current_settings["any_domain"]
--queue_monthly_url(string.match(url, "^(https?://[^/]+)") .. "/")
if redirect_urls[parenturl] and not (
status_code == 300 and string.match(parenturl, "^https?://[^/]*feb%-web%.ru/")
) then
return true
end
if find_path_loop(url, 2) then
return false
end
local _, count = string.gsub(url, "[/%?]", "")
if count >= 16 then
return false
end
for _, extension in pairs({
"pdf",
"doc[mx]?",
"xls[mx]?",
"ppt[mx]?",
"zip",
"odt",
"odm",
"ods",
"odp",
"xml",
"json",
"torrent"
}) do
if string.match(parenturl, "%." .. extension .. "$")
or string.match(parenturl, "%." .. extension .. "[^a-z0-9A-Z]")
or string.match(parenturl, "%." .. string.upper(extension) .. "$")
or string.match(parenturl, "%." .. string.upper(extension) .. "[^a-z0-9A-Z]") then
return false
end
if string.match(url, "%." .. extension .. "$")
or string.match(url, "%." .. extension .. "[^a-z0-9A-Z]")
or string.match(url, "%." .. string.upper(extension) .. "$")
or string.match(url, "%." .. string.upper(extension) .. "[^a-z0-9A-Z]") then
queue_url(url)
return false
end
end
local domain_match = checked_domains[item_first_url]
if not domain_match then
domain_match = check_domain_outlinks(item_first_url)
if not domain_match then
domain_match = "none"
end
checked_domains[item_first_url] = domain_match
end
if domain_match ~= "none" then
extract_page_requisites = true
local newurl_domain = string.match(url, "^https?://([^/]+)")
local to_queue = true
for domain, _ in pairs(redirect_domains) do
if check_domain_outlinks(url, domain) then
to_queue = false
break
end
end
if to_queue then
queue_url(url)
return false
end
end
--[[if not extract_page_requisites then
return false
end]]
if (status_code < 200 or status_code >= 300 or not verdict)
and not current_settings_all then
return false
end
--[[if string.len(url) == string.len(parenturl) then
local good_url = false
local index1, index2
temp_url = string.match(url, "^https?://(.+)$")
temp_parenturl = string.match(parenturl, "^https?://(.+)$")
local start_index = 1
repeat
index1 = string.find(temp_url, "/", start_index)
index2 = string.find(temp_parenturl, "/", start_index)
if index1 ~= index2 then
good_url = true
break
end
if index1 then
start_index = index1 + 1
end
until not index1 or not index2
if not good_url then
return false
end
end]]
if parenturl_uuid == nil then
parenturl_uuid = false
for old_parent_url, _ in pairs(visited_urls) do
for id_to_ignore, _ in pairs(ids_to_ignore) do
if string.match(old_parent_url, id_to_ignore) then
parenturl_uuid = true
break
end
end
if parenturl_uuid then
break
end
end
end
if parenturl_uuid then
for id_to_ignore, _ in pairs(ids_to_ignore) do
if string.match(url, id_to_ignore) and not current_settings_all then
return false
end
end
end
if urlpos["link_refresh_p"] ~= 0 then
queue_url(url)
return false
end
if parenturl_requisite == nil then
parenturl_requisite = false
for _, pattern in pairs(page_requisite_patterns) do
for old_parent_url, _ in pairs(visited_urls) do
if string.match(old_parent_url, pattern) then
parenturl_requisite = true
break
end
end
if parenturl_requisite then
break
end
end
end
if parenturl_requisite and not current_settings_all then
return false
end
if urlpos["link_inline_p"] ~= 0 then
queue_url(url)
return false
end
local current_host = string.match(urlpos["url"]["host"], "([^%.]+%.[^%.]+)$")
local first_parent_host = string.match(parent["host"], "([^%.]+%.[^%.]+)$")
if current_url then
first_parent_host = string.match(current_url .. "/", "^https?://[^/]-([^/%.]+%.[^/%.]+)/")
end
if current_settings_all and (
current_settings_any_domain
or first_parent_host == current_host
) then
queue_url(url, true)
return false
end
--[[for old_parent_url, _ in pairs(visited_urls) do
for _, pattern in pairs(page_requisite_patterns) do
if string.match(old_parent_url, pattern) then
return false
end
end
end
for _, pattern in pairs(page_requisite_patterns) do
if string.match(url, pattern) then
queue_url(url)
return false
end
end]]
end
wget.callbacks.get_urls = function(file, url, is_css, iri)
local html = nil
if url then
downloaded[url] = true
end
local function check(url, headers)
local url = string.match(url, "^([^#]+)")
url = string.gsub(url, "&amp;", "&")
queue_url(url)
end
local function checknewurl(newurl, headers)
if string.match(newurl, "^#") then
return nil
end
if string.match(newurl, "\\[uU]002[fF]") then
return checknewurl(string.gsub(newurl, "\\[uU]002[fF]", "/"), headers)
end
if string.match(newurl, "^https?:////") then
check(string.gsub(newurl, ":////", "://"), headers)
elseif string.match(newurl, "^https?://") then
check(newurl, headers)
elseif string.match(newurl, "^https?:\\/\\?/") then
check(string.gsub(newurl, "\\", ""), headers)
elseif not url then
return nil
elseif string.match(newurl, "^\\/") then
checknewurl(string.gsub(newurl, "\\", ""), headers)
elseif string.match(newurl, "^//") then
check(urlparse.absolute(url, newurl), headers)
elseif string.match(newurl, "^/") then
check(urlparse.absolute(url, newurl), headers)
elseif string.match(newurl, "^%.%./") then
if string.match(url, "^https?://[^/]+/[^/]+/") then
check(urlparse.absolute(url, newurl), headers)
else
checknewurl(string.match(newurl, "^%.%.(/.+)$"), headers)
end
elseif string.match(newurl, "^%./") then
check(urlparse.absolute(url, newurl), headers)
end
end
local function checknewshorturl(newurl, headers)
if string.match(newurl, "^#") then
return nil
end
if url and string.match(newurl, "^%?") then
check(urlparse.absolute(url, newurl), headers)
elseif url and not (string.match(newurl, "^https?:\\?/\\?//?/?")
or string.match(newurl, "^[/\\]")
or string.match(newurl, "^%./")
or string.match(newurl, "^[jJ]ava[sS]cript:")
or string.match(newurl, "^[mM]ail[tT]o:")
or string.match(newurl, "^vine:")
or string.match(newurl, "^android%-app:")
or string.match(newurl, "^ios%-app:")
or string.match(newurl, "^%${")) then
check(urlparse.absolute(url, newurl), headers)
else
checknewurl(newurl, headers)
end
end
if (status_code == 200 and current_settings and current_settings["deep_extract"])
or not url then
html = read_file(file)
if not url then
html = string.gsub(html, "&#160;", " ")
html = string.gsub(html, "&lt;", "<")
html = string.gsub(html, "&gt;", ">")
html = string.gsub(html, "&quot;", '"')
html = string.gsub(html, "&apos;", "'")
html = string.gsub(html, "&#(%d+);",
function(n)
return string.char(n)
end
)
html = string.gsub(html, "&#x(%d+);",
function(n)
return string.char(tonumber(n, 16))
end
)
local temp_html = string.gsub(html, "\n", "")
for _, remove in pairs({"", "<br/>", "</?p[^>]*>"}) do
if remove ~= "" then
temp_html = string.gsub(temp_html, remove, "")
end
for newurl in string.gmatch(temp_html, "(https?://[^%s<>#\"'\\`{})%]]+)") do
while string.match(newurl, "[%.&,!;]$") do
newurl = string.match(newurl, "^(.+).$")
end
check(newurl)
end
end
end
for newurl in string.gmatch(html, "[^%-][hH][rR][eE][fF]='([^']+)'") do
checknewshorturl(newurl)
end
for newurl in string.gmatch(html, '[^%-][hH][rR][eE][fF]="([^"]+)"') do
checknewshorturl(newurl)
end
for newurl in string.gmatch(string.gsub(html, "&[qQ][uU][oO][tT];", '"'), '"(https?://[^"]+)') do
checknewurl(newurl)
end
for newurl in string.gmatch(string.gsub(html, "&#039;", "'"), "'(https?://[^']+)") do
checknewurl(newurl)
end
if url then
for newurl in string.gmatch(html, ">%s*([^<%s]+)") do
checknewurl(newurl)
end
end
--[[for newurl in string.gmatch(html, "%(([^%)]+)%)") do
checknewurl(newurl)
end]]
elseif string.match(url, "^https?://[^/]+/.*[^a-z0-9A-Z][pP][dD][fF]$")
or string.match(url, "^https?://[^/]+/.*[^a-z0-9A-Z][pP][dD][fF][^a-z0-9A-Z]")
or string.match(read_file(file, 4), "%%[pP][dD][fF]") then
io.stdout:write("Extracting links from PDF.\n")
io.stdout:flush()
local temp_file = file .. "-html.html"
local check_file = io.open(temp_file)
if check_file then
check_file:close()
os.remove(temp_file)
end
os.execute("pdftohtml -nodrm -hidden -i -s -q " .. file)
check_file = io.open(temp_file)
if check_file then
check_file:close()
local temp_length = table_length(queued_urls)
wget.callbacks.get_urls(temp_file, nil, nil, nil)
io.stdout:write("Found " .. tostring(table_length(queued_urls)-temp_length) .. " URLs.\n")
io.stdout:flush()
os.remove(temp_file)
else
io.stdout:write("Not a PDF.\n")
io.stdout:flush()
end
end
end
wget.callbacks.write_to_warc = function(url, http_stat)
local url_lower = string.lower(url["url"])
if urls[url_lower] then
current_url = url_lower
current_settings = urls_settings[url_lower]
end
if current_settings and not current_settings["random"] then
queue_url(url["url"])
return false
end
if bad_code(http_stat["statcode"]) then
return false
elseif http_stat["statcode"] >= 300 and http_stat["statcode"] <= 399 then
local newloc = urlparse.absolute(url["url"], http_stat["newloc"])
if string.match(newloc, "^https?://[^/]*google%.com/sorry")
or string.match(newloc, "^https?://[^/]*google%.com/[sS]ervice[lL]ogin")
or string.match(newloc, "^https?://consent%.youtube%.com/")
or string.match(newloc, "^https?://consent%.google%.com/")
or string.match(newloc, "^https?://misuse%.ncbi%.nlm%.nih%.gov/")
or string.match(newloc, "^https?://myprivacy%.dpgmedia%.nl/")
or string.match(newloc, "^https?://idp%.springer%.com/authorize%?")
or string.match(newloc, "^https?://[^/]*instagram%.com/accounts/") then
report_bad_url(url["url"])
exit_url = true
return false
end
return true
elseif http_stat["statcode"] ~= 200 then
return true
end
if true then
return true
end
if http_stat["len"] > min_dedup_mb * 1024 * 1024 then
io.stdout:write("Data larger than " .. tostring(min_dedup_mb) .. " MB. Checking with Wayback Machine.\n")
io.stdout:flush()
while true do
local body, code, headers, status = http.request(
"https://web.archive.org/__wb/calendarcaptures/2"
.. "?url=" .. urlparse.escape(url["url"])
.. "&date=202"
)
if code ~= 200 then
io.stdout:write("Got " .. tostring(code) .. " from the Wayback Machine.\n")
io.stdout:flush()
os.execute("sleep 10")
else
data = JSON:decode(body)
if not data["items"] or not data["colls"] then
return true
end
for _, item in pairs(data["items"]) do
if item[2] == 200 then
local coll_id = item[3] + 1
if not coll_id then
io.stdout:write("Could get coll ID.\n")
io.stdout:flush()
end
local collections = data["colls"][coll_id]
if not collections then
io.stdout:write("Could not get collections.\n")
io.stdout:flush()
end
for _, collection in pairs(collections) do
if collection == "archivebot"
or string.find(collection, "archiveteam") then
io.stdout:write("Archive Team got this URL before.\n")
return false
end
end
end
end
break
end
end
end
return true
end
wget.callbacks.httploop_result = function(url, err, http_stat)
status_code = http_stat["statcode"]
parenturl_uuid = nil
parenturl_requisite = nil
local url_lower = string.lower(url["url"])
if urls[url_lower] then
current_url = url_lower
current_settings = urls_settings[url_lower]
end
if not timestamp then
local body, code, headers, status = http.request("https://legacy-api.arpa.li/now")
assert(code == 200)
timestamp = tonumber(string.match(body, "^([0-9]+)"))
end
if status_code ~= 0 then
local base_url = string.match(url["url"], "^(https://[^/]+)")
if base_url then
for _, newurl in pairs({
base_url .. "/robots.txt",
base_url .. "/favicon.ico",
base_url .. "/"
}) do
queue_monthly_url(newurl)
end
end
end
url_count = url_count + 1
io.stdout:write(url_count .. "=" .. status_code .. " " .. url["url"] .. " \n")
io.stdout:flush()
if redirect_domains["done"] then
redirect_domains = {}
redirect_urls = {}
visited_urls = {}
item_first_url = nil
end
redirect_domains[string.match(url["url"], "^https?://([^/]+)")] = true
if not item_first_url then
item_first_url = url["url"]
end
visited_urls[url["url"]] = true
if exit_url then
exit_url = false
return wget.actions.EXIT
end
if status_code >= 300 and status_code <= 399 then
local newloc = urlparse.absolute(url["url"], http_stat["newloc"])
redirect_urls[url["url"]] = true
--[[if strip_url(url["url"]) == strip_url(newloc) then
queued_urls[newloc] = true
return wget.actions.EXIT
end]]
if downloaded[newloc] then
return wget.actions.EXIT
elseif string.match(url["url"], "^https?://[^/]*telegram%.org/dl%?tme=")
or (
string.match(newloc, "^https?://www%.(.+)")
or string.match(newloc, "^https?://(.+)")
) == (
string.match(url["url"], "^https?://www%.(.+)")
or string.match(url["url"], "^https?://(.+)")
)
or status_code == 301
or status_code == 308 then
queue_url(newloc)
return wget.actions.EXIT
end
else
redirect_domains["done"] = true
end
if downloaded[url["url"]] then
report_bad_url(url["url"])
return wget.actions.EXIT
end
for _, pattern in pairs(ignore_patterns) do
if string.match(url["url"], pattern) then
return wget.actions.EXIT
end
end
if status_code >= 200 and status_code <= 399 then
downloaded[url["url"]] = true
end
if status_code >= 200 and status_code < 300 then
queue_new_urls(url["url"])
end
if bad_code(status_code) then
io.stdout:write("Server returned " .. http_stat.statcode .. " (" .. err .. ").\n")
io.stdout:flush()
report_bad_url(url["url"])
return wget.actions.EXIT
end
local sleep_time = 0
if sleep_time > 0.001 then
os.execute("sleep " .. sleep_time)
end
return wget.actions.NOTHING
end
wget.callbacks.finish = function(start_time, end_time, wall_time, numurls, total_downloaded_bytes, total_download_time)
local function submit_backfeed(newurls)
local tries = 0
local maxtries = 4
while tries < maxtries do
local body, code, headers, status = http.request(
"https://legacy-api.arpa.li/backfeed/legacy/urls-glx7ansh4e17aii",
newurls .. "\0"
)
print(body)
if code == 200 then
io.stdout:write("Submitted discovered URLs.\n")
io.stdout:flush()
break
end
io.stdout:write("Failed to submit discovered URLs." .. tostring(code) .. tostring(body) .. "\n")
io.stdout:flush()
os.execute("sleep " .. math.floor(math.pow(2, tries)))
tries = tries + 1
end
if tries == maxtries then
abortgrab = true
end
end
local newurls = nil
local is_bad = false
local count = 0
local dup_urls = io.open(item_dir .. "/" .. warc_file_base .. "_duplicate-urls.txt", "w")
for url, _ in pairs(queued_urls) do
for _, pattern in pairs(bad_patterns) do
is_bad = string.match(url, pattern)
if is_bad then
io.stdout:write("Filtering out URL " .. url .. ".\n")
io.stdout:flush()
break
end
end
if not is_bad then
io.stdout:write("Queuing URL " .. url .. ".\n")
io.stdout:flush()
dup_urls:write(url .. "\n")
if newurls == nil then
newurls = url
else
newurls = newurls .. "\0" .. url
end
count = count + 1
if count == 100 then
submit_backfeed(newurls)
newurls = nil
count = 0
end
end
end
if newurls ~= nil then
submit_backfeed(newurls)
end
dup_urls:close()
local file = io.open(item_dir .. "/" .. warc_file_base .. "_bad-urls.txt", "w")
for url, _ in pairs(bad_urls) do
file:write(url .. "\n")
end
file:close()
end
wget.callbacks.before_exit = function(exit_status, exit_status_string)
if abortgrab then
return wget.exits.IO_FAIL
end
return exit_status
end

381
user-agents.txt Normal file
View File

@ -0,0 +1,381 @@
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:40.0) Gecko/20100101 Firefox/62.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:43.0) Gecko/20100101 Firefox/43.0 SeaMonkey/2.40
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:45.0) Gecko/20100101 Firefox/45.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:47.0) Gecko/20100101 Firefox/47.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:48.0) Gecko/20100101 Firefox/48.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:49.0) Gecko/20100101 Firefox/49.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:50.0) Gecko/20100101 Firefox/50.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:52.0) Gecko/20100101 Firefox/52.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:54.0) Gecko/20100101 Firefox/54.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:56.0) Gecko/20100101 Firefox/56.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:56.0) Gecko/20100101 Firefox/56.0.4 Waterfox/56.0.4
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.3
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.4
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.5
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:58.0) Gecko/20100101 Firefox/58.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:59.0) Gecko/20100101 Firefox/59.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:60.0) Gecko/20100101 Firefox/60.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:61.0) Gecko/20100101 Firefox/61.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:62.0) Gecko/20100101 Firefox/62.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:63.0) Gecko/20100101 Firefox/63.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:64.0) Gecko/20100101 Firefox/64.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:65.0) Gecko/20100101 Firefox/65.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:43.0) Gecko/20100101 Firefox/43.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:43.0) Gecko/20100101 Firefox/43.0 SeaMonkey/2.40
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:47.0) Gecko/20100101 Firefox/47.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:48.0) Gecko/20100101 Firefox/48.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:49.0) Gecko/20100101 Firefox/49.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:50.0) Gecko/20100101 Firefox/50.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:51.0) Gecko/20100101 Firefox/51.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:51.0) Gecko/20100101 Firefox/51.0 SeaMonkey/2.48
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:52.0) Gecko/20100101 Firefox/52.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:54.0) Gecko/20100101 Firefox/54.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:55.0) Gecko/20100101 Firefox/55.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:56.0) Gecko/20100101 Firefox/56.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:56.0) Gecko/20100101 Firefox/56.0.4 Waterfox/56.0.4
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.3
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.4
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.5
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:57.0) Gecko/20100101 Firefox/57.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:58.0) Gecko/20100101 Firefox/58.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:59.0) Gecko/20100101 Firefox/59.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:60.0) Gecko/20100101 Firefox/60.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:61.0) Gecko/20100101 Firefox/61.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:62.0) Gecko/20100101 Firefox/62.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:63.0) Gecko/20100101 Firefox/63.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:64.0) Gecko/20100101 Firefox/64.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:49.0) Gecko/20100101 Firefox/49.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:50.0) Gecko/20100101 Firefox/50.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:51.0) Gecko/20100101 Firefox/51.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Firefox/52.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:53.0) Gecko/20100101 Firefox/53.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:54.0) Gecko/20100101 Firefox/54.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:55.0) Gecko/20100101 Firefox/55.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:56.0) Gecko/20100101 Firefox/56.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.5
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:57.0) Gecko/20100101 Firefox/57.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:58.0) Gecko/20100101 Firefox/58.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:59.0) Gecko/20100101 Firefox/59.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:59.0.2) Gecko/20100101 Firefox/59.0.2
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:60.0) Gecko/20100101 Firefox/60.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:61.0) Gecko/20100101 Firefox/61.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:61.0) Gecko/20100101 Firefox/62.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:63.0) Gecko/20100101 Firefox/63.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:64.0) Gecko/20100101 Firefox/64.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:40.0) Gecko/20100101 Firefox/40.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:48.0) Gecko/20100101 Firefox/48.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Firefox/52.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Firefox/52.0 SeaMonkey/2.49.3
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:55.0) Gecko/20100101 Firefox/55.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:56.0) Gecko/20100101 Firefox/56.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:56.0) Gecko/20100101 Firefox/56.0.4 Waterfox/56.0.4
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.3
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.4
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.5
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:57.0) Gecko/20100101 Firefox/57.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:57.0) Gecko/20100101 Firefox/99.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:58.0) Gecko/20100101 Firefox/58.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:59.0) Gecko/20100101 Firefox/59.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:60.0) Gecko/20100101 Firefox/60.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:61.0) Gecko/20100101 Firefox/61.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:62.0) Gecko/20100101 Firefox/62.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:63.0) Gecko/20100101 Firefox/63.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:64.0) Gecko/20100101 Firefox/64.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:65.0) Gecko/20100101 Firefox/65.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:52.0) Gecko/20100101 Firefox/52.0 SeaMonkey/2.49.2
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.5
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:57.0) Gecko/20100101 Firefox/57.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Firefox/60.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:61.0) Gecko/20100101 Firefox/61.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:62.0) Gecko/20100101 Firefox/62.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:63.0) Gecko/20100101 Firefox/63.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:64.0) Gecko/20100101 Firefox/64.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:40.0) Gecko/20100101 Firefox/40.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:45.0) Gecko/20100101 Firefox/45.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:47.0) Gecko/20100101 Firefox/47.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:48.0) Gecko/20100101 Firefox/48.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.8.3
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:44.0) Gecko/20100101 Firefox/44.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:45.0) Gecko/20100101 Firefox/45.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:47.0) Gecko/20100101 Firefox/47.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:48.0) Gecko/20100101 Firefox/48.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:49.0) Gecko/20100101 Firefox/49.0.2.1 Waterfox/49.0.2.1
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:45.0) Gecko/20100101 Firefox/45.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:48.0) Gecko/20100101 Firefox/48.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:56.0) Gecko/20100101 Firefox/56.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.5
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:41.0) Gecko/20100101 Firefox/41.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:47.0) Gecko/20100101 Firefox/47.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:48.0) Gecko/20100101 Firefox/48.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:52.0) Gecko/20100101 Firefox/52.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:52.0) Gecko/20100101 Firefox/52.0 SeaMonkey/2.49.1
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:55.0) Gecko/20100101 Firefox/55.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:56.0) Gecko/20100101 Firefox/56.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:56.0) Gecko/20100101 Firefox/56.0.1 Waterfox/56.0.1
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:57.0) Gecko/20100101 Firefox/57.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:58.0) Gecko/20100101 Firefox/58.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:59.0) Gecko/20100101 Firefox/59.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:60.0) Gecko/20100101 Firefox/60.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:61.0) Gecko/20100101 Firefox/61.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:62.0) Gecko/20100101 Firefox/62.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:63.0) Gecko/20100101 Firefox/63.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.1.2 Safari/605.1.15
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1; rv:50.0) Gecko/20100101 Firefox/50.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1; rv:55.0) Gecko/20100101 Firefox/55.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2; rv:49.0) Gecko/20100101 Firefox/49.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.102 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.1 Safari/605.1.15
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.80 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.1.2 Safari/605.1.15
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.1 Safari/605.1.15
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.2 Safari/605.1.15
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Safari/605.1.15
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.80 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.1 Safari/605.1.15
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.80 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3639.1 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.2 Safari/605.1.15
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_29_81; rv:45.70.23) Gecko/20134284 Firefox/45.70.23
Mozilla/5.0 (Macintosh; Intel Mac OS X 11.11; rv:51.0) Gecko/20100101 Firefox/60.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 9.3; rv:45.0) Gecko/20100101 Firefox/57.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 9.3; rv:45.0) Gecko/20100101 Firefox/59.0.2
Mozilla/5.0 (Macintosh; PPC Mac OS X 10.11; rv:46.0) Gecko/20100101 Firefox/46.0
Mozilla/5.0 (Macintosh; PPC Mac OS X 10.12; rv:46.0) Gecko/20100101 Firefox/46.0
Mozilla/5.0 (Macintosh; PPC Mac OS X 10.12; rv:55.0) Gecko/20100101 Firefox/55.0
Mozilla/5.0 (Macintosh; PPC Mac OS X 10.4; FPR7; rv:45.0) Gecko/20100101 Firefox/45.0 TenFourFox/G5
Mozilla/5.0 (Macintosh; PPC Mac OS X 10.4; FPR8; rv:45.0) Gecko/20100101 Firefox/45.0 TenFourFox/G5
Mozilla/5.0 (Macintosh; PPC Mac OS X 10.4; FPR9; rv:45.0) Gecko/20100101 Firefox/45.0 TenFourFox/G5
Mozilla/5.0 (Macintosh; PPC Mac OS X 10.5; FPR8; rv:45.0) Gecko/20100101 Firefox/45.0 TenFourFox/7450
Mozilla/5.0 (Macintosh; PPC Mac OS X 10.8; rv:47.0) Gecko/20100101 Firefox/47.0
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.10; rv:59.0) Gecko/20100101 Firefox/59.0
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.10; rv:64.0) Gecko/20100101 Firefox/64.0
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.10; rv:65.0) Gecko/20100101 Firefox/65.0
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.11; rv:59.0) Gecko/20100101 Firefox/59.0
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.11; rv:60.0) Gecko/20100101 Firefox/60.0
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.11; rv:61.0) Gecko/20100101 Firefox/61.0
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.11; rv:62.0) Gecko/20100101 Firefox/62.0
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.12; rv:54.0) Gecko/20100101 Firefox/54.0
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.12; rv:60.0) Gecko/20100101 Firefox/60.0
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.12; rv:61.0) Gecko/20100101 Firefox/61.0
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.13; rv:59.0) Gecko/20100101 Firefox/59.0
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.13; rv:60.0) Gecko/20100101 Firefox/60.0
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.13; rv:62.0) Gecko/20100101 Firefox/62.0
Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36
Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36
Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.85 Safari/537.36
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:20.0) Gecko/20100101 Firefox/60.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:40.0) Gecko/20100101 Firefox/45.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:40.0) Gecko/20100101 IceDragon/40.1.1.18 Firefox/40.0.2
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:41.0) Gecko/20100101 Firefox/41.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0 Framafox/43.0.1
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0 SeaMonkey/2.40
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.63.16) Gecko/20175595 Firefox/45.63.16
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:48.0) Gecko/20100101 Firefox/48.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0 SeaMonkey/2.46
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/45.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/47.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/60.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/64.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0 Cyberfox/52.9.1
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0 SeaMonkey/2.49.1
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0 SeaMonkey/2.49.2
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0 SeaMonkey/2.49.2 Lightning/5.4
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0 SeaMonkey/2.49.3
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0 SeaMonkey/2.49.4
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0 Zotero/5.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.9) Gecko/20100101 Firefox/52.9
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.6.2
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.7.2
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.8.2
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.8.3
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.9.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.9.1
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.9.2
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.9.3
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 Basilisk/20180927
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 PaleMoon/28.0.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 PaleMoon/28.0.0a2
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 PaleMoon/28.0.1
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 PaleMoon/28.1.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:55.0) Gecko/20100101 Firefox/55.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:56.0) Gecko/20100101 Firefox/50.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:56.0) Gecko/20100101 Firefox/56.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:56.0) Gecko/20100101 Firefox/56.0 SeaMonkey/2.49.3
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:56.0) Gecko/20100101 Firefox/57.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:57.0) Gecko/20100101 Firefox/57.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:58.0) Gecko/20100101 Firefox/58.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:58.0) Gecko/20100101 Firefox/58.0 IceDragon/58.0.1
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:59.0) Gecko/20100101 Firefox/59.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Firefox/60.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Firefox/60.0 IceDragon/60.0.2
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.9) Gecko/20100101 Goanna/4.1 Firefox/60.9 PaleMoon/28.2.1
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:61.0) Gecko/20100101 Firefox/61.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:61.0) Gecko/20100101 Firefox/61.0 IceDragon/61.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:62.0) Gecko/20100101 Firefox/62.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:62.0) Gecko/20100101 Firefox/62.0 IceDragon/62.0.2
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:63.0) Gecko/20100101 Firefox/63.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:64.0) Gecko/20100101 Firefox/64.0
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:65.0) Gecko/20100101 Firefox/65.0
Mozilla/5.0 (Windows NT 10.0; Win64; rv:54.0) Gecko/20100101 Firefox/54.0
Mozilla/5.0 (Windows NT 10.0; Win64; rv:55.0) Gecko/20100101 Firefox/55.0
Mozilla/5.0 (Windows NT 10.0; Win64; rv:59.0) Gecko/20100101 Firefox/59.0
Mozilla/5.0 (Windows NT 10.0; Win64; rv:60.0) Gecko/20100101 Firefox/60.0
Mozilla/5.0 (Windows NT 10.0; Win64; rv:61.0) Gecko/20100101 Firefox/61.0
Mozilla/5.0 (Windows NT 10.0; Win64; rv:61.0) Gecko/20100101 Firefox/62.0
Mozilla/5.0 (Windows NT 10.0; Win64; rv:62.0) Gecko/20100101 Firefox/62.0
Mozilla/5.0 (Windows NT 10.0; Win64; rv:63.0) Gecko/20100101 Firefox/63.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:41.0) Gecko/20100101 Firefox/41.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:43.0) Gecko/20100101 Firefox/43.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:43.0) Gecko/20100101 Firefox/43.0.4 Waterfox/43.0.4
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:46.0) Gecko/20100101 Firefox/46.0.1 Waterfox/46.0.1
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:49.0) Gecko/20100101 Firefox/49.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:50.0) Gecko/20100101 Firefox/50.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:51.0) Gecko/20100101 Firefox/51.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.0) Gecko/20100101 Firefox/52.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.0) Gecko/20100101 Firefox/52.0 Cyberfox/52.0.4
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.0) Gecko/20100101 Firefox/52.0 Cyberfox/52.5.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.0) Gecko/20100101 Firefox/52.0 Cyberfox/52.5.2
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.0) Gecko/20100101 Firefox/52.0 Cyberfox/52.7.2
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.0) Gecko/20100101 Firefox/52.0 Cyberfox/52.7.4
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.0) Gecko/20100101 Firefox/52.0 Cyberfox/52.8.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.0) Gecko/20100101 Firefox/52.0 Cyberfox/52.9.1
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.0) Gecko/20100101 Firefox/52.0.2 Waterfox/52.0.2
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.0) Gecko/20100101 Firefox/59.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/3.3 Firefox/52.9 PaleMoon/27.5.1
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.8.3
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.9.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.9.1
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.9.2
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.9.3
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.9.4
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 Basilisk/20180424
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 Basilisk/20180515
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 Basilisk/20180601
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 Basilisk/20180718
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 Basilisk/20180905
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 Basilisk/20180927
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 PaleMoon/28.0.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 PaleMoon/28.0.0.1
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 PaleMoon/28.0.1
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 PaleMoon/28.1.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:53.0) Gecko/20100101 Firefox/53.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0.1 Waterfox/54.0.1
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:55.0) Gecko/20100101 Firefox/55.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0.1 Waterfox/56.0.1
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0.4 Waterfox/56.0.4
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.3
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.4
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.5
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/59.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:62.0) Gecko/20100101 Firefox/62.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0
Mozilla/5.0 (Windows NT 10.0; rv:44.0) Gecko/20100101 Firefox/44.0.1
Mozilla/5.0 (Windows NT 10.0; rv:45.0) Gecko/20100101 Firefox/45.0
Mozilla/5.0 (Windows NT 10.0; rv:47.0) Gecko/20100101 Firefox/47.0
Mozilla/5.0 (Windows NT 10.0; rv:49.0) Gecko/20100101 Firefox/49.0
Mozilla/5.0 (Windows NT 10.0; rv:50.0) Gecko/20100101 Firefox/50.0
Mozilla/5.0 (Windows NT 10.0; rv:51.0) Gecko/20100101 Firefox/51.0
Mozilla/5.0 (Windows NT 10.0; rv:52.0) Gecko/20100101 Firefox/52.0
Mozilla/5.0 (Windows NT 10.0; rv:52.0) Gecko/20100101 Firefox/52.0 Cyberfox/52.7.2
Mozilla/5.0 (Windows NT 10.0; rv:52.0) Gecko/20100101 Firefox/52.0 Cyberfox/52.9.1
Mozilla/5.0 (Windows NT 10.0; rv:52.0) Gecko/20100101 Firefox/52.0 SeaMonkey/2.49.4
Mozilla/5.0 (Windows NT 10.0; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.9.1
Mozilla/5.0 (Windows NT 10.0; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.9.1a1
Mozilla/5.0 (Windows NT 10.0; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.9.3
Mozilla/5.0 (Windows NT 10.0; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 PaleMoon/28.1.0
Mozilla/5.0 (Windows NT 10.0; rv:53.0) Gecko/20100101 Firefox/53.0
Mozilla/5.0 (Windows NT 10.0; rv:55.0) Gecko/20100101 Firefox/55.0
Mozilla/5.0 (Windows NT 10.0; rv:56.0) Gecko/20100101 Firefox/56.0
Mozilla/5.0 (Windows NT 10.0; rv:57.0) Gecko/20100101 Firefox/57.0
Mozilla/5.0 (Windows NT 10.0; rv:58.0) Gecko/20100101 Firefox/58.0
Mozilla/5.0 (Windows NT 10.0; rv:59.0) Gecko/20100101 Firefox/59.0
Mozilla/5.0 (Windows NT 10.0; rv:60.0) Gecko/20100101 Firefox/60.0
Mozilla/5.0 (Windows NT 10.0; rv:61.0) Gecko/20100101 Firefox/61.0
Mozilla/5.0 (Windows NT 10.0; rv:62.0) Gecko/20100101 Firefox/62.0
Mozilla/5.0 (Windows NT 10.0; rv:63.0) Gecko/20100101 Firefox/63.0
Mozilla/5.0 (Windows NT 10.0; rv:64.0) Gecko/20100101 Firefox/64.0
Mozilla/5.0 (Windows NT 4.0; rv:52.0) Gecko/20100101 Firefox/52.0
Mozilla/5.0 (Windows NT 5.1; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0
Mozilla/5.0 (Windows NT 5.1; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0
Mozilla/5.0 (Windows NT 5.1; WOW64; rv:61.0) Gecko/20100101 Firefox/61.0
Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36
Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36
Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:55.0) Gecko/20100101 Firefox/55.0
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:63.0) Gecko/20100101 Firefox/63.0
Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.80 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:52.0) Gecko/20100101 Firefox/52.0
Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0
Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0
Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0
Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Firefox/60.0
Mozilla/5.0 (Windows NT 6.1; rv:63.0) Gecko/20100101 Firefox/63.0
Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36
Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36
Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0
Mozilla/5.0 (X11; CrOS x86_64 11021.81.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.67 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.80 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/70.0.3538.77 Chrome/70.0.3538.77 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0
Mozilla/5.0 (X11; Linux x86_64; rv:63.0) Gecko/20100101 Firefox/63.0
Mozilla/5.0 (X11; Linux x86_64; rv:64.0) Gecko/20100101 Firefox/64.0
Mozilla/5.0 (X11; OpenBSD amd64; rv:56.0) Gecko/20100101 Firefox/66.0
Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:63.0) Gecko/20100101 Firefox/63.0
Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:64.0) Gecko/20100101 Firefox/64.0