• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

berkmancenter / mediacloud
54%
master: 70%

Build:
Build:
LAST BUILD BRANCH: release
DEFAULT BRANCH: master
Repo Added 11 Jul 2014 12:47PM UTC
Files 335
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

LAST BUILD ON BRANCH fix_topic_word_count
branch: fix_topic_word_count
CHANGE BRANCH
x
Reset
  • fix_topic_word_count
  • ColCarroll-add_similarweb
  • RELEASE_20140325
  • add_flake8
  • alter_raw_downloads_storage_external
  • ansible-provisioning
  • ansible_localhost
  • ansible_provisioning
  • ap_api
  • ap_feed
  • api_cast_ids_to_int
  • api_coverage_test
  • api_limits_decrease
  • api_params_cast_to_int
  • api_topics_media_links
  • api_user_request_counts
  • archive_site_stories
  • archive_tv_sample_import
  • archive_twitter
  • auth_api
  • auth_backend_python
  • auth_case_insensitive_email
  • auth_reset_token_hash_if_not_null
  • auth_role_constants
  • auth_users_stats_index
  • autopod
  • backup_crawler_improvements
  • bigrams
  • bitly_100m_semiauto_partitions
  • bitly_all_stories
  • bitly_api
  • bitly_api_direct_call
  • bitly_click_stats
  • bitly_controversy_high_priority
  • bitly_created_by
  • bitly_disable_referrers
  • bitly_export_api_solr
  • bitly_ignore_homepages
  • bitly_json_webui
  • bitly_python
  • bitly_recreate_partition_trigger
  • bitly_stats_on_separate_page
  • bitly_store_results_in_s3
  • bitly_time_slices
  • bitly_web_ui
  • canonical_url_exception
  • cast_get_temporary_ids_table_param_to_int
  • celery_disable_heartbeat
  • celery_no_concurrency
  • celery_no_solo
  • ch_twitter_api
  • changes_from_production
  • chromeos_debian_stretch
  • chunk_topic_seed_query
  • circle_ci
  • clean_up_old_scripts
  • cliff_nytlabels_python
  • cliff_nytlabels_stories_id_int
  • cliff_nytlabels_tagging
  • cliff_tagging
  • cm_meta_20150319
  • cm_network_map
  • cm_performance
  • cm_slice
  • cm_status
  • colors_python
  • contibutors-info
  • corenlp_all_new_stories
  • corenlp_annotate_whole_story
  • corenlp_english_only
  • corenlp_enqueue_emm
  • corenlp_extra_stories
  • corenlp_give_up_after_500_read_timeout
  • corenlp_limit_request_size
  • corenlp_retry_requests
  • corenlp_sentences_id_mismatch
  • corenlp_test_request_length
  • cors_header
  • crashing_gearman_worker
  • crawler_auth
  • crawler_celery_fetchers
  • crawler_expected_results_json
  • crawler_rename_test_data
  • crawler_web_page_download_interval
  • create_missing_partitions_python
  • created_but_not_queued
  • date_guess_no_threshold
  • date_guessing_perl_integration
  • db_tuple_parameters
  • dbi_rewrite_to_python
  • default_config_mediawords_yml_dist
  • delete_story_sentences
  • devel_cover
  • devel_cover_db_fix
  • disable_superglue_live_feed
  • distributed_topic_spider
  • distributed_topic_spider_extract_links
  • distributed_topic_spider_fetch_link
  • doc_markdown
  • domaim_media
  • download_fixme
  • downloads_allow_no_prefix
  • downloads_cache
  • downloads_read_from_s3
  • drop_obsolete_db_columns_relations
  • drop_story_sentences_nonpartitioned
  • drop_superglue_support
  • edit_topic_queries
  • email_templates
  • encapsulate_dbix_simple
  • encapsulate_lwp_useragent
  • encapsulate_python_logging
  • export_import_tools_python
  • export_timespans_to_solr
  • extract_pipeline
  • extractor_null_bytes
  • extractor_python
  • extractor_readability_tests
  • extractor_requeue_locked_media
  • extractor_robustness
  • extractor_strip_control_chars
  • extractor_training
  • extractor_verbose_logging
  • facebook_cm_stats
  • facebook_page_links
  • feedly_import_20160322
  • fetch_link_extractor
  • fetch_tweet
  • fix_archive_is
  • fix_atom_feeds
  • fix_colors_perl_test
  • fix_crawler_tests_dates
  • fix_datetime_test
  • fix_db_quote
  • fix_dup_media
  • fix_facebook_test
  • fix_get_url_host_invalid_url
  • fix_guess_date
  • fix_is_http_url_normalize
  • fix_raw_1st_download
  • fix_readability_unit_tests
  • fix_tag_paging
  • fix_topic_ua_failures
  • fix_travis
  • fix_travis_ansible
  • fix_travis_carton_perl_516
  • fix_travis_cld
  • fix_travis_lxd
  • fix_travis_pip_cryptography
  • fix_travis_python
  • fix_travis_rabbitmq
  • fix_travis_random_failures
  • fix_travis_timezone
  • fix_travis_unit_tests
  • fix_twitter_test
  • fix_ua_attribute_error
  • fix_ua_error_http_body
  • fix_ua_unicode_decode_error
  • fix_vagrant_perlbrew
  • fix_vagrant_s3_tests
  • fix_vagrant_test
  • gearman_multithreading
  • generate_communities
  • graph_pm
  • hausa
  • hausa_stemmer_try
  • hindi_language
  • http_request_log_utf8_url
  • identify_language_test
  • ignore_topic_log_dead_link_unicodeencodeerror
  • increase_default_api_request_quota
  • index_tags
  • inline_python_segfault
  • inline_python_test
  • is_http_url_at_sign
  • japanese_support
  • job_broker_agnostic
  • json_annotations_encode_utf8
  • key_value_store_python
  • langid_python
  • language_modules_python
  • limit_solr_imports
  • limit_story_self_links
  • linkis
  • lock-mongodb-version
  • log4perl_everywhere
  • log_running_jobs
  • logogram_topics
  • master
  • media_coverage
  • media_geotag
  • media_health
  • media_import_export_backup_crawler
  • media_inlinks
  • media_list_q
  • media_normalized_url_trigger
  • media_primary_language
  • media_tag_dates
  • merge_api_client_from_upstream
  • milestone_5
  • ml_ap_detection
  • move_corenlp_bitly_to_postgresql
  • move_object_cache_to_postgresql
  • move_schema_migrations
  • move_solr_tools_to_mediacloud
  • multi_platform_topics
  • native_upserts
  • network_map_giant_component
  • null_inline_bug
  • object_cache_postgresql
  • only_fix_successful_downloads
  • parse_solr_query
  • partition_downloads
  • partition_downloads_2
  • partition_stories_tags_map_table
  • partition_story_sentences
  • partitioned_table_subquery
  • periodic_feed_rescraping
  • perl_5_22
  • perl_5_22_warnings
  • perl_api_client
  • pgcrypto_public
  • pin_rabbitmq_erlang_version
  • postgresql_10
  • postgresql_11
  • postgresql_datetime_string
  • prevent_spam_users
  • primary_key_column_view_support
  • psycopg2_escape_percentage_sign
  • py_date_guesser
  • python
  • python-requirements
  • python_3_7
  • python_forceatlas
  • python_handler_deproxy
  • python_provider
  • python_requests_ssl
  • python_rewrite_rules
  • python_rewrite_test
  • python_test_coverage_coveralls
  • python_topic_merge
  • python_topic_tweets
  • quick_snapshot
  • rabbitmq_3_7_3
  • rabbitmq_celery
  • rabbitmq_packagecloud
  • raw_downloads_separate_db
  • read_gridfs_downloads_from_s3
  • readability_cleanup
  • recover_failed_bitly_rabbitmq_queue
  • reenable_superglue_feed
  • reextract_stories_with_readability
  • reextraction_daemon
  • refactor_constant_into_readonly
  • reformat_solr_query
  • release
  • remove_auth_single
  • remove_auth_user_requests
  • remove_backwards_compatible_feed_fields
  • remove_bitly
  • remove_bots
  • remove_db_row_last_updated
  • remove_downloads_cascade_trigger
  • remove_fetch_twitter_scripts
  • remove_gearman_remains
  • remove_gridfs_in_corenlp_bitly
  • remove_heurex_crf_remains
  • remove_inline_downloads
  • remove_legacy_cruft
  • remove_mongodb
  • remove_old_web_ui
  • remove_reextract_script
  • remove_remains_of_controversies
  • remove_remains_of_db_row_last_updated
  • remove_sentences_from_solr
  • remove_story_sentences_tags_map
  • remove_thrift
  • remove_topics_mediasets_dashboards_etc
  • remove_translate_pattern_to_perl
  • remove_twitter_share_count
  • remove_util_dateparse
  • rename_cd_to_snap
  • rename_logger_object
  • request_limit_http_status
  • request_undefined
  • rescrape_us_msm
  • rescraping_daily_report
  • reset_us_election_stopword_list
  • restore_missing_downloads_helper
  • resurrect_twitter_stats
  • retweeter_regex
  • reuse_is_shortened_url
  • revert_inline_python
  • revert_python_dbix_simple
  • revert_raw_downloads_separate_db
  • rewrite_db_handler_to_python
  • ro_primary_key
  • rotate_http_request_log
  • rss_guesser
  • run_alone_var_run
  • s3_tests_unique_buckets
  • safe_exit
  • secure_travis_test_keys
  • separate_api_tests
  • serial_tests
  • sim_study
  • similar_stories
  • similarweb_fixes
  • similarweb_goodbye_dump
  • sitemap_feeds
  • sitemap_plain_text
  • skip_kernel_parameters
  • snapshot_bomb
  • solr_4.10.4
  • solr_4.6.0
  • solr_5.0.0
  • solr_5.5.2
  • solr_6.0.0
  • solr_6.1.0
  • solr_6_3_0
  • solr_6_5_0
  • solr_client_side_media_collections
  • solr_cve_2017_12629
  • solr_gradle
  • solr_import_log
  • solr_import_prioritize_topics
  • solr_import_supervisor
  • solr_install_externally
  • solr_oom
  • solr_python
  • solr_query_word_boundaries
  • solr_reduce_filter_cache
  • solr_remove_collection2
  • solr_remove_fq
  • solr_utf8
  • sources_api
  • speed_up_corenlp_api_call
  • speed_up_decode_object_from_bytes_if_needed
  • speed_up_sentence_field_count
  • speed_up_tags_list_fts_search
  • speed_up_tests
  • spider_performance
  • static_rss_dumps
  • stories_edit_api
  • story_index
  • story_language_stopwords
  • story_sentences_dup
  • story_sentences_fix_deadlock
  • story_sentences_partitioned_db_row_last_updated
  • story_title_parts
  • superglue
  • superglue_backfill_feeds
  • superglue_hide_video_url
  • test_optimizations
  • tidy_defaults
  • tm_initial_snapshot_celery
  • topic_limit_timespan_size
  • topic_media_tags
  • topic_modelling
  • topic_modelling_local
  • topic_retweeter_poles
  • topic_snapshot_include_media
  • topic_snapshot_tags
  • topic_spider_queue_snapshot
  • topic_versions
  • topics_api
  • travis_carton_cache_shasum
  • travis_ci_api_test
  • travis_container
  • travis_lxd
  • travis_lxd_word2vec
  • travis_mc_python_client
  • travis_trusty
  • travis_virtualenv
  • trigger_column
  • tweet_parse
  • twitter
  • ua_encoding_timeout
  • ua_response_utf8_surrogate_chars
  • ua_topic_fixes
  • ubuntu_16
  • ubuntu_16_no_perlbrew
  • uncompress_raw_downloads
  • univision_feeds
  • update_lxc_image
  • update_travis_docker_image
  • upgrade_ansible
  • upgrade_db_single_transaction
  • url_variants_python
  • use_mc_api_testing_client
  • use_pytest
  • use_text_worddiff_again
  • user_api
  • user_consent
  • useragent_memleak
  • vagrant_crawler_nowarnings
  • verify_schema_version_for_process
  • view_cached_page
  • web_useragent_python
  • word2vec_c_compat
  • word2vec_seq_scan
  • word2vec_topic_model
  • word_count_max_repeats
  • yaml_booleans
  • zh_dict
  • zh_support

pending completion
4453

push

travis-ci

pypt
Join sentences and stories to temporary table instead of using IN()

If temporary table is empty (as seems to be the case with some Solr
queries which don't return any sentences), PostgreSQL decides to do a
sequential scan on "stories" table:

mediacloud=# explain SELECT sentence,
                            stories.language AS story_language
                     FROM story_sentences
                         INNER JOIN stories
                             ON story_sentences.stories_id = stories.stories_id
                     WHERE story_sentences_id IN (
                         SELECT id
                         FROM empty_temporary_table
                     );
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=1349980901.31..1437936383.03 rows=4829242229 width=136)
   Merge Cond: (stories.stories_id = story_sentences.stories_id)
   ->  Sort  (cost=177068210.59..178802946.28 rows=693894273 width=6)
         Sort Key: stories.stories_id
         ->  Seq Scan on stories  (cost=0.00..69748298.73 rows=693894273 width=6)
   ->  Materialize  (cost=1172912448.35..1197058659.50 rows=4829242229 width=138)
         ->  Sort  (cost=1172912448.35..1184985553.93 rows=4829242229 width=138)
               Sort Key: story_sentences.stories_id
               ->  Nested Loop  (cost=38.96..587.75 rows=4829242229 width=138)
                     ->  HashAggregate  (cost=38.25..40.25 rows=200 width=8)
                           Group Key: xxx.id
                           ->  Seq Scan on xxx  (cost=0.00..32.60 rows=2260 width=8)
                     ->  Index Scan using story_sentences_pkey on story_sentences  (cost=0.71..2.73 rows=1 width=146)
                           Index Cond: (story_sentences_id = xxx.id)

References #233.

8027 of 14911 relevant lines covered (53.83%)

977.45 hits per line

Relevant lines Covered
Build:
Build:
14911 RELEVANT LINES 8027 COVERED LINES
977.45 HITS PER LINE
Source Files on fix_topic_word_count
  • List 0
  • Changed 0
  • Source Changed 0
  • Coverage Changed 0
Coverage ∆ File Lines Relevant Covered Missed Hits/Line

Recent builds

Builds Branch Commit Type Ran Committer Via Coverage
4453 fix_topic_word_count Join sentences and stories to temporary table instead of using IN() If temporary table is empty (as seems to be the case with some Solr queries which don't return any sentences), PostgreSQL decides to do a sequential scan on "stories" table: med... push 04 Dec 2017 09:43AM UTC pypt travis-ci pending completion  
See All Builds (4335)
  • Repo on GitHub
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc