• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

berkmancenter / mediacloud
57%
master: 70%

Build:
Build:
LAST BUILD BRANCH: release
DEFAULT BRANCH: master
Repo Added 11 Jul 2014 12:47PM UTC
Files 335
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

LAST BUILD ON BRANCH extractor_requeue_locked_media
branch: extractor_requeue_locked_media
CHANGE BRANCH
x
Reset
  • extractor_requeue_locked_media
  • ColCarroll-add_similarweb
  • RELEASE_20140325
  • add_flake8
  • alter_raw_downloads_storage_external
  • ansible-provisioning
  • ansible_localhost
  • ansible_provisioning
  • ap_api
  • ap_feed
  • api_cast_ids_to_int
  • api_coverage_test
  • api_limits_decrease
  • api_params_cast_to_int
  • api_topics_media_links
  • api_user_request_counts
  • archive_site_stories
  • archive_tv_sample_import
  • archive_twitter
  • auth_api
  • auth_backend_python
  • auth_case_insensitive_email
  • auth_reset_token_hash_if_not_null
  • auth_role_constants
  • auth_users_stats_index
  • autopod
  • backup_crawler_improvements
  • bigrams
  • bitly_100m_semiauto_partitions
  • bitly_all_stories
  • bitly_api
  • bitly_api_direct_call
  • bitly_click_stats
  • bitly_controversy_high_priority
  • bitly_created_by
  • bitly_disable_referrers
  • bitly_export_api_solr
  • bitly_ignore_homepages
  • bitly_json_webui
  • bitly_python
  • bitly_recreate_partition_trigger
  • bitly_stats_on_separate_page
  • bitly_store_results_in_s3
  • bitly_time_slices
  • bitly_web_ui
  • canonical_url_exception
  • cast_get_temporary_ids_table_param_to_int
  • celery_disable_heartbeat
  • celery_no_concurrency
  • celery_no_solo
  • ch_twitter_api
  • changes_from_production
  • chromeos_debian_stretch
  • chunk_topic_seed_query
  • circle_ci
  • clean_up_old_scripts
  • cliff_nytlabels_python
  • cliff_nytlabels_stories_id_int
  • cliff_nytlabels_tagging
  • cliff_tagging
  • cm_meta_20150319
  • cm_network_map
  • cm_performance
  • cm_slice
  • cm_status
  • colors_python
  • contibutors-info
  • corenlp_all_new_stories
  • corenlp_annotate_whole_story
  • corenlp_english_only
  • corenlp_enqueue_emm
  • corenlp_extra_stories
  • corenlp_give_up_after_500_read_timeout
  • corenlp_limit_request_size
  • corenlp_retry_requests
  • corenlp_sentences_id_mismatch
  • corenlp_test_request_length
  • cors_header
  • crashing_gearman_worker
  • crawler_auth
  • crawler_celery_fetchers
  • crawler_expected_results_json
  • crawler_rename_test_data
  • crawler_web_page_download_interval
  • create_missing_partitions_python
  • created_but_not_queued
  • date_guess_no_threshold
  • date_guessing_perl_integration
  • db_tuple_parameters
  • dbi_rewrite_to_python
  • default_config_mediawords_yml_dist
  • delete_story_sentences
  • devel_cover
  • devel_cover_db_fix
  • disable_superglue_live_feed
  • distributed_topic_spider
  • distributed_topic_spider_extract_links
  • distributed_topic_spider_fetch_link
  • doc_markdown
  • domaim_media
  • download_fixme
  • downloads_allow_no_prefix
  • downloads_cache
  • downloads_read_from_s3
  • drop_obsolete_db_columns_relations
  • drop_story_sentences_nonpartitioned
  • drop_superglue_support
  • edit_topic_queries
  • email_templates
  • encapsulate_dbix_simple
  • encapsulate_lwp_useragent
  • encapsulate_python_logging
  • export_import_tools_python
  • export_timespans_to_solr
  • extract_pipeline
  • extractor_null_bytes
  • extractor_python
  • extractor_readability_tests
  • extractor_robustness
  • extractor_strip_control_chars
  • extractor_training
  • extractor_verbose_logging
  • facebook_cm_stats
  • facebook_page_links
  • feedly_import_20160322
  • fetch_link_extractor
  • fetch_tweet
  • fix_archive_is
  • fix_atom_feeds
  • fix_colors_perl_test
  • fix_crawler_tests_dates
  • fix_datetime_test
  • fix_db_quote
  • fix_dup_media
  • fix_facebook_test
  • fix_get_url_host_invalid_url
  • fix_guess_date
  • fix_is_http_url_normalize
  • fix_raw_1st_download
  • fix_readability_unit_tests
  • fix_tag_paging
  • fix_topic_ua_failures
  • fix_topic_word_count
  • fix_travis
  • fix_travis_ansible
  • fix_travis_carton_perl_516
  • fix_travis_cld
  • fix_travis_lxd
  • fix_travis_pip_cryptography
  • fix_travis_python
  • fix_travis_rabbitmq
  • fix_travis_random_failures
  • fix_travis_timezone
  • fix_travis_unit_tests
  • fix_twitter_test
  • fix_ua_attribute_error
  • fix_ua_error_http_body
  • fix_ua_unicode_decode_error
  • fix_vagrant_perlbrew
  • fix_vagrant_s3_tests
  • fix_vagrant_test
  • gearman_multithreading
  • generate_communities
  • graph_pm
  • hausa
  • hausa_stemmer_try
  • hindi_language
  • http_request_log_utf8_url
  • identify_language_test
  • ignore_topic_log_dead_link_unicodeencodeerror
  • increase_default_api_request_quota
  • index_tags
  • inline_python_segfault
  • inline_python_test
  • is_http_url_at_sign
  • japanese_support
  • job_broker_agnostic
  • json_annotations_encode_utf8
  • key_value_store_python
  • langid_python
  • language_modules_python
  • limit_solr_imports
  • limit_story_self_links
  • linkis
  • lock-mongodb-version
  • log4perl_everywhere
  • log_running_jobs
  • logogram_topics
  • master
  • media_coverage
  • media_geotag
  • media_health
  • media_import_export_backup_crawler
  • media_inlinks
  • media_list_q
  • media_normalized_url_trigger
  • media_primary_language
  • media_tag_dates
  • merge_api_client_from_upstream
  • milestone_5
  • ml_ap_detection
  • move_corenlp_bitly_to_postgresql
  • move_object_cache_to_postgresql
  • move_schema_migrations
  • move_solr_tools_to_mediacloud
  • multi_platform_topics
  • native_upserts
  • network_map_giant_component
  • null_inline_bug
  • object_cache_postgresql
  • only_fix_successful_downloads
  • parse_solr_query
  • partition_downloads
  • partition_downloads_2
  • partition_stories_tags_map_table
  • partition_story_sentences
  • partitioned_table_subquery
  • periodic_feed_rescraping
  • perl_5_22
  • perl_5_22_warnings
  • perl_api_client
  • pgcrypto_public
  • pin_rabbitmq_erlang_version
  • postgresql_10
  • postgresql_11
  • postgresql_datetime_string
  • prevent_spam_users
  • primary_key_column_view_support
  • psycopg2_escape_percentage_sign
  • py_date_guesser
  • python
  • python-requirements
  • python_3_7
  • python_forceatlas
  • python_handler_deproxy
  • python_provider
  • python_requests_ssl
  • python_rewrite_rules
  • python_rewrite_test
  • python_test_coverage_coveralls
  • python_topic_merge
  • python_topic_tweets
  • quick_snapshot
  • rabbitmq_3_7_3
  • rabbitmq_celery
  • rabbitmq_packagecloud
  • raw_downloads_separate_db
  • read_gridfs_downloads_from_s3
  • readability_cleanup
  • recover_failed_bitly_rabbitmq_queue
  • reenable_superglue_feed
  • reextract_stories_with_readability
  • reextraction_daemon
  • refactor_constant_into_readonly
  • reformat_solr_query
  • release
  • remove_auth_single
  • remove_auth_user_requests
  • remove_backwards_compatible_feed_fields
  • remove_bitly
  • remove_bots
  • remove_db_row_last_updated
  • remove_downloads_cascade_trigger
  • remove_fetch_twitter_scripts
  • remove_gearman_remains
  • remove_gridfs_in_corenlp_bitly
  • remove_heurex_crf_remains
  • remove_inline_downloads
  • remove_legacy_cruft
  • remove_mongodb
  • remove_old_web_ui
  • remove_reextract_script
  • remove_remains_of_controversies
  • remove_remains_of_db_row_last_updated
  • remove_sentences_from_solr
  • remove_story_sentences_tags_map
  • remove_thrift
  • remove_topics_mediasets_dashboards_etc
  • remove_translate_pattern_to_perl
  • remove_twitter_share_count
  • remove_util_dateparse
  • rename_cd_to_snap
  • rename_logger_object
  • request_limit_http_status
  • request_undefined
  • rescrape_us_msm
  • rescraping_daily_report
  • reset_us_election_stopword_list
  • restore_missing_downloads_helper
  • resurrect_twitter_stats
  • retweeter_regex
  • reuse_is_shortened_url
  • revert_inline_python
  • revert_python_dbix_simple
  • revert_raw_downloads_separate_db
  • rewrite_db_handler_to_python
  • ro_primary_key
  • rotate_http_request_log
  • rss_guesser
  • run_alone_var_run
  • s3_tests_unique_buckets
  • safe_exit
  • secure_travis_test_keys
  • separate_api_tests
  • serial_tests
  • sim_study
  • similar_stories
  • similarweb_fixes
  • similarweb_goodbye_dump
  • sitemap_feeds
  • sitemap_plain_text
  • skip_kernel_parameters
  • snapshot_bomb
  • solr_4.10.4
  • solr_4.6.0
  • solr_5.0.0
  • solr_5.5.2
  • solr_6.0.0
  • solr_6.1.0
  • solr_6_3_0
  • solr_6_5_0
  • solr_client_side_media_collections
  • solr_cve_2017_12629
  • solr_gradle
  • solr_import_log
  • solr_import_prioritize_topics
  • solr_import_supervisor
  • solr_install_externally
  • solr_oom
  • solr_python
  • solr_query_word_boundaries
  • solr_reduce_filter_cache
  • solr_remove_collection2
  • solr_remove_fq
  • solr_utf8
  • sources_api
  • speed_up_corenlp_api_call
  • speed_up_decode_object_from_bytes_if_needed
  • speed_up_sentence_field_count
  • speed_up_tags_list_fts_search
  • speed_up_tests
  • spider_performance
  • static_rss_dumps
  • stories_edit_api
  • story_index
  • story_language_stopwords
  • story_sentences_dup
  • story_sentences_fix_deadlock
  • story_sentences_partitioned_db_row_last_updated
  • story_title_parts
  • superglue
  • superglue_backfill_feeds
  • superglue_hide_video_url
  • test_optimizations
  • tidy_defaults
  • tm_initial_snapshot_celery
  • topic_limit_timespan_size
  • topic_media_tags
  • topic_modelling
  • topic_modelling_local
  • topic_retweeter_poles
  • topic_snapshot_include_media
  • topic_snapshot_tags
  • topic_spider_queue_snapshot
  • topic_versions
  • topics_api
  • travis_carton_cache_shasum
  • travis_ci_api_test
  • travis_container
  • travis_lxd
  • travis_lxd_word2vec
  • travis_mc_python_client
  • travis_trusty
  • travis_virtualenv
  • trigger_column
  • tweet_parse
  • twitter
  • ua_encoding_timeout
  • ua_response_utf8_surrogate_chars
  • ua_topic_fixes
  • ubuntu_16
  • ubuntu_16_no_perlbrew
  • uncompress_raw_downloads
  • univision_feeds
  • update_lxc_image
  • update_travis_docker_image
  • upgrade_ansible
  • upgrade_db_single_transaction
  • url_variants_python
  • use_mc_api_testing_client
  • use_pytest
  • use_text_worddiff_again
  • user_api
  • user_consent
  • useragent_memleak
  • vagrant_crawler_nowarnings
  • verify_schema_version_for_process
  • view_cached_page
  • web_useragent_python
  • word2vec_c_compat
  • word2vec_seq_scan
  • word2vec_topic_model
  • word_count_max_repeats
  • yaml_booleans
  • zh_dict
  • zh_support

pending completion
3570

push

travis-ci

hroberts
requeue media locked extractor jobs to avoid media congestion

Our extractor job pool has the potential to get stuck on a single choke point in
the sentence deduping code, where it has to wait for a lock on the media_id
story.  If the extractor job pool gets a long series of stories with the same media_id,
we can end up waiting for the pool to process one of those jobs at a time until the
media_id series is cleared.

This commit mitigates this issue by checking at the very beginning of the extract_and_vector
job whether the media_id of the story is currently locked, using a non-blocking call.  If the
media_id is locked, the job just requeues itself under low priority and exits.  This will still
allow for a brief backlog when a few stories from the same media_id are initially put into the
queue, but it should clear up any of those backlogs within a couple of minutes, instead of
potentially blocking for a couple of days, as has happened a couple of times.

8394 of 14850 relevant lines covered (56.53%)

1581.89 hits per line

Relevant lines Covered
Build:
Build:
14850 RELEVANT LINES 8394 COVERED LINES
1581.89 HITS PER LINE
Source Files on extractor_requeue_locked_media
  • List 0
  • Changed 0
  • Source Changed 0
  • Coverage Changed 0
Coverage ∆ File Lines Relevant Covered Missed Hits/Line

Recent builds

Builds Branch Commit Type Ran Committer Via Coverage
3570 extractor_requeue_locked_media requeue media locked extractor jobs to avoid media congestion Our extractor job pool has the potential to get stuck on a single choke point in the sentence deduping code, where it has to wait for a lock on the media_id story. If the extractor jo... push 17 Apr 2017 08:26PM UTC hroberts travis-ci pending completion  
See All Builds (4335)
  • Repo on GitHub
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc