• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

berkmancenter / mediacloud
64%
master: 70%

Build:
Build:
LAST BUILD BRANCH: release
DEFAULT BRANCH: master
Repo Added 11 Jul 2014 12:47PM UTC
Files 335
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

LAST BUILD ON BRANCH log_running_jobs
branch: log_running_jobs
CHANGE BRANCH
x
Reset
  • log_running_jobs
  • ColCarroll-add_similarweb
  • RELEASE_20140325
  • add_flake8
  • alter_raw_downloads_storage_external
  • ansible-provisioning
  • ansible_localhost
  • ansible_provisioning
  • ap_api
  • ap_feed
  • api_cast_ids_to_int
  • api_coverage_test
  • api_limits_decrease
  • api_params_cast_to_int
  • api_topics_media_links
  • api_user_request_counts
  • archive_site_stories
  • archive_tv_sample_import
  • archive_twitter
  • auth_api
  • auth_backend_python
  • auth_case_insensitive_email
  • auth_reset_token_hash_if_not_null
  • auth_role_constants
  • auth_users_stats_index
  • autopod
  • backup_crawler_improvements
  • bigrams
  • bitly_100m_semiauto_partitions
  • bitly_all_stories
  • bitly_api
  • bitly_api_direct_call
  • bitly_click_stats
  • bitly_controversy_high_priority
  • bitly_created_by
  • bitly_disable_referrers
  • bitly_export_api_solr
  • bitly_ignore_homepages
  • bitly_json_webui
  • bitly_python
  • bitly_recreate_partition_trigger
  • bitly_stats_on_separate_page
  • bitly_store_results_in_s3
  • bitly_time_slices
  • bitly_web_ui
  • canonical_url_exception
  • cast_get_temporary_ids_table_param_to_int
  • celery_disable_heartbeat
  • celery_no_concurrency
  • celery_no_solo
  • ch_twitter_api
  • changes_from_production
  • chromeos_debian_stretch
  • chunk_topic_seed_query
  • circle_ci
  • clean_up_old_scripts
  • cliff_nytlabels_python
  • cliff_nytlabels_stories_id_int
  • cliff_nytlabels_tagging
  • cliff_tagging
  • cm_meta_20150319
  • cm_network_map
  • cm_performance
  • cm_slice
  • cm_status
  • colors_python
  • contibutors-info
  • corenlp_all_new_stories
  • corenlp_annotate_whole_story
  • corenlp_english_only
  • corenlp_enqueue_emm
  • corenlp_extra_stories
  • corenlp_give_up_after_500_read_timeout
  • corenlp_limit_request_size
  • corenlp_retry_requests
  • corenlp_sentences_id_mismatch
  • corenlp_test_request_length
  • cors_header
  • crashing_gearman_worker
  • crawler_auth
  • crawler_celery_fetchers
  • crawler_expected_results_json
  • crawler_rename_test_data
  • crawler_web_page_download_interval
  • create_missing_partitions_python
  • created_but_not_queued
  • date_guess_no_threshold
  • date_guessing_perl_integration
  • db_tuple_parameters
  • dbi_rewrite_to_python
  • default_config_mediawords_yml_dist
  • delete_story_sentences
  • devel_cover
  • devel_cover_db_fix
  • disable_superglue_live_feed
  • distributed_topic_spider
  • distributed_topic_spider_extract_links
  • distributed_topic_spider_fetch_link
  • doc_markdown
  • domaim_media
  • download_fixme
  • downloads_allow_no_prefix
  • downloads_cache
  • downloads_read_from_s3
  • drop_obsolete_db_columns_relations
  • drop_story_sentences_nonpartitioned
  • drop_superglue_support
  • edit_topic_queries
  • email_templates
  • encapsulate_dbix_simple
  • encapsulate_lwp_useragent
  • encapsulate_python_logging
  • export_import_tools_python
  • export_timespans_to_solr
  • extract_pipeline
  • extractor_null_bytes
  • extractor_python
  • extractor_readability_tests
  • extractor_requeue_locked_media
  • extractor_robustness
  • extractor_strip_control_chars
  • extractor_training
  • extractor_verbose_logging
  • facebook_cm_stats
  • facebook_page_links
  • feedly_import_20160322
  • fetch_link_extractor
  • fetch_tweet
  • fix_archive_is
  • fix_atom_feeds
  • fix_colors_perl_test
  • fix_crawler_tests_dates
  • fix_datetime_test
  • fix_db_quote
  • fix_dup_media
  • fix_facebook_test
  • fix_get_url_host_invalid_url
  • fix_guess_date
  • fix_is_http_url_normalize
  • fix_raw_1st_download
  • fix_readability_unit_tests
  • fix_tag_paging
  • fix_topic_ua_failures
  • fix_topic_word_count
  • fix_travis
  • fix_travis_ansible
  • fix_travis_carton_perl_516
  • fix_travis_cld
  • fix_travis_lxd
  • fix_travis_pip_cryptography
  • fix_travis_python
  • fix_travis_rabbitmq
  • fix_travis_random_failures
  • fix_travis_timezone
  • fix_travis_unit_tests
  • fix_twitter_test
  • fix_ua_attribute_error
  • fix_ua_error_http_body
  • fix_ua_unicode_decode_error
  • fix_vagrant_perlbrew
  • fix_vagrant_s3_tests
  • fix_vagrant_test
  • gearman_multithreading
  • generate_communities
  • graph_pm
  • hausa
  • hausa_stemmer_try
  • hindi_language
  • http_request_log_utf8_url
  • identify_language_test
  • ignore_topic_log_dead_link_unicodeencodeerror
  • increase_default_api_request_quota
  • index_tags
  • inline_python_segfault
  • inline_python_test
  • is_http_url_at_sign
  • japanese_support
  • job_broker_agnostic
  • json_annotations_encode_utf8
  • key_value_store_python
  • langid_python
  • language_modules_python
  • limit_solr_imports
  • limit_story_self_links
  • linkis
  • lock-mongodb-version
  • log4perl_everywhere
  • logogram_topics
  • master
  • media_coverage
  • media_geotag
  • media_health
  • media_import_export_backup_crawler
  • media_inlinks
  • media_list_q
  • media_normalized_url_trigger
  • media_primary_language
  • media_tag_dates
  • merge_api_client_from_upstream
  • milestone_5
  • ml_ap_detection
  • move_corenlp_bitly_to_postgresql
  • move_object_cache_to_postgresql
  • move_schema_migrations
  • move_solr_tools_to_mediacloud
  • multi_platform_topics
  • native_upserts
  • network_map_giant_component
  • null_inline_bug
  • object_cache_postgresql
  • only_fix_successful_downloads
  • parse_solr_query
  • partition_downloads
  • partition_downloads_2
  • partition_stories_tags_map_table
  • partition_story_sentences
  • partitioned_table_subquery
  • periodic_feed_rescraping
  • perl_5_22
  • perl_5_22_warnings
  • perl_api_client
  • pgcrypto_public
  • pin_rabbitmq_erlang_version
  • postgresql_10
  • postgresql_11
  • postgresql_datetime_string
  • prevent_spam_users
  • primary_key_column_view_support
  • psycopg2_escape_percentage_sign
  • py_date_guesser
  • python
  • python-requirements
  • python_3_7
  • python_forceatlas
  • python_handler_deproxy
  • python_provider
  • python_requests_ssl
  • python_rewrite_rules
  • python_rewrite_test
  • python_test_coverage_coveralls
  • python_topic_merge
  • python_topic_tweets
  • quick_snapshot
  • rabbitmq_3_7_3
  • rabbitmq_celery
  • rabbitmq_packagecloud
  • raw_downloads_separate_db
  • read_gridfs_downloads_from_s3
  • readability_cleanup
  • recover_failed_bitly_rabbitmq_queue
  • reenable_superglue_feed
  • reextract_stories_with_readability
  • reextraction_daemon
  • refactor_constant_into_readonly
  • reformat_solr_query
  • release
  • remove_auth_single
  • remove_auth_user_requests
  • remove_backwards_compatible_feed_fields
  • remove_bitly
  • remove_bots
  • remove_db_row_last_updated
  • remove_downloads_cascade_trigger
  • remove_fetch_twitter_scripts
  • remove_gearman_remains
  • remove_gridfs_in_corenlp_bitly
  • remove_heurex_crf_remains
  • remove_inline_downloads
  • remove_legacy_cruft
  • remove_mongodb
  • remove_old_web_ui
  • remove_reextract_script
  • remove_remains_of_controversies
  • remove_remains_of_db_row_last_updated
  • remove_sentences_from_solr
  • remove_story_sentences_tags_map
  • remove_thrift
  • remove_topics_mediasets_dashboards_etc
  • remove_translate_pattern_to_perl
  • remove_twitter_share_count
  • remove_util_dateparse
  • rename_cd_to_snap
  • rename_logger_object
  • request_limit_http_status
  • request_undefined
  • rescrape_us_msm
  • rescraping_daily_report
  • reset_us_election_stopword_list
  • restore_missing_downloads_helper
  • resurrect_twitter_stats
  • retweeter_regex
  • reuse_is_shortened_url
  • revert_inline_python
  • revert_python_dbix_simple
  • revert_raw_downloads_separate_db
  • rewrite_db_handler_to_python
  • ro_primary_key
  • rotate_http_request_log
  • rss_guesser
  • run_alone_var_run
  • s3_tests_unique_buckets
  • safe_exit
  • secure_travis_test_keys
  • separate_api_tests
  • serial_tests
  • sim_study
  • similar_stories
  • similarweb_fixes
  • similarweb_goodbye_dump
  • sitemap_feeds
  • sitemap_plain_text
  • skip_kernel_parameters
  • snapshot_bomb
  • solr_4.10.4
  • solr_4.6.0
  • solr_5.0.0
  • solr_5.5.2
  • solr_6.0.0
  • solr_6.1.0
  • solr_6_3_0
  • solr_6_5_0
  • solr_client_side_media_collections
  • solr_cve_2017_12629
  • solr_gradle
  • solr_import_log
  • solr_import_prioritize_topics
  • solr_import_supervisor
  • solr_install_externally
  • solr_oom
  • solr_python
  • solr_query_word_boundaries
  • solr_reduce_filter_cache
  • solr_remove_collection2
  • solr_remove_fq
  • solr_utf8
  • sources_api
  • speed_up_corenlp_api_call
  • speed_up_decode_object_from_bytes_if_needed
  • speed_up_sentence_field_count
  • speed_up_tags_list_fts_search
  • speed_up_tests
  • spider_performance
  • static_rss_dumps
  • stories_edit_api
  • story_index
  • story_language_stopwords
  • story_sentences_dup
  • story_sentences_fix_deadlock
  • story_sentences_partitioned_db_row_last_updated
  • story_title_parts
  • superglue
  • superglue_backfill_feeds
  • superglue_hide_video_url
  • test_optimizations
  • tidy_defaults
  • tm_initial_snapshot_celery
  • topic_limit_timespan_size
  • topic_media_tags
  • topic_modelling
  • topic_modelling_local
  • topic_retweeter_poles
  • topic_snapshot_include_media
  • topic_snapshot_tags
  • topic_spider_queue_snapshot
  • topic_versions
  • topics_api
  • travis_carton_cache_shasum
  • travis_ci_api_test
  • travis_container
  • travis_lxd
  • travis_lxd_word2vec
  • travis_mc_python_client
  • travis_trusty
  • travis_virtualenv
  • trigger_column
  • tweet_parse
  • twitter
  • ua_encoding_timeout
  • ua_response_utf8_surrogate_chars
  • ua_topic_fixes
  • ubuntu_16
  • ubuntu_16_no_perlbrew
  • uncompress_raw_downloads
  • univision_feeds
  • update_lxc_image
  • update_travis_docker_image
  • upgrade_ansible
  • upgrade_db_single_transaction
  • url_variants_python
  • use_mc_api_testing_client
  • use_pytest
  • use_text_worddiff_again
  • user_api
  • user_consent
  • useragent_memleak
  • vagrant_crawler_nowarnings
  • verify_schema_version_for_process
  • view_cached_page
  • web_useragent_python
  • word2vec_c_compat
  • word2vec_seq_scan
  • word2vec_topic_model
  • word_count_max_repeats
  • yaml_booleans
  • zh_dict
  • zh_support

pending completion
5642

push

travis-ci

hroberts
optimize query that provides downloads to the crawler

The existing query that provides pending downlopads to the crawler
was behaving poorly for the case that we have lots of pending downloads
and that they are over concentrated in a single source.  The result
was that the vast majority of the pending downloads getting into the
intermediate queue were from a couple dozen sites.  The downloads
from that intermediate queue were being throttled to avoid downloading
more than once per second from a given site, so the crawler was getting
very few downloads into the the fetchers.

The updated query limits the number of downloads returned to one
million but sorts those downloads by the rank for the given site.  This
approach greatly reduces the impact of the couple dozen sites with tens
of thousands of downloads in the queue, allowing the crawler to work
through the download queue much more quickly.

5 of 5 new or added lines in 1 file covered. (100.0%)

14354 of 22437 relevant lines covered (63.97%)

634.98 hits per line

Relevant lines Covered
Build:
Build:
22437 RELEVANT LINES 14354 COVERED LINES
634.98 HITS PER LINE
Source Files on log_running_jobs
  • List 0
  • Changed 0
  • Source Changed 0
  • Coverage Changed 0
Coverage ∆ File Lines Relevant Covered Missed Hits/Line

Recent builds

Builds Branch Commit Type Ran Committer Via Coverage
5642 log_running_jobs optimize query that provides downloads to the crawler The existing query that provides pending downlopads to the crawler was behaving poorly for the case that we have lots of pending downloads and that they are over concentrated in a single sourc... push 11 Sep 2018 10:27PM UTC hroberts travis-ci pending completion  
See All Builds (4335)
  • Repo on GitHub
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc