Field Options¶
The package performs several parses to facilitate the analysis of archived tweets and types of tweets. The fields below are available, which can be passed to the Parse and Export, in addition, the command line tool returns all these fields.
archived_urlkey
: (str) A canonical transformation of the URL you supplied, for example,org,eserver,tc)/
. Such keys are useful for indexing.archived_timestamp
: (str) A 14 digit date-time representation in theYYYYMMDDhhmmss
format.parsed_archived_timestamp
: (str) Thearchived_timestamp
in human-readable format.archived_tweet_url
: (str) The archived URL.parsed_archived_tweet_url
: (str) The archived URL after parsing. It is not guaranteed that this option will be archived, it is just a facilitator, as the originally archived URL does not always exist, due to changes in URLs and web services of the social network Twitter. Check the Utils.original_tweet_url
: (str) The original tweet URL.parsed_tweet_url
: (str) The original tweet URL after parsing. Old URLs were archived in a nested manner. The parsing applied here unnests these URLs, when necessary. Check the Utils.available_tweet_text
: (str) The tweet text extracted from the URL that is still available on the Twitter account.available_tweet_is_RT
: (bool) Whether the tweet from theavailable_tweet_text
field is a retweet or not.available_tweet_info
: (str) Name and date of the tweet from theavailable_tweet_text
field.archived_mimetype
: (str) The mimetype of the archived content, which can be one of these:text/html
warc/revisit
application/json
unk
archived_statuscode
: (str) The HTTP status code of the snapshot. If the mimetype iswarc/revisit
, the value returned for thestatuscode
key can be blank, but the actual value is the same as that of any other entry that has the samedigest
as this entry. If the mimetype isapplication/json
, the value is usually empty or-
.archived_digest
: (str) TheSHA1
hash digest of the content, excluding the headers. It’s usually a base-32-encoded string.archived_length
: (int) The compressed byte size of the corresponding WARC record, which includes WARC headers, HTTP headers, and content payload.