• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

berkmancenter / mediacloud
48%
master: 70%

Build:
Build:
LAST BUILD BRANCH: release
DEFAULT BRANCH: master
Repo Added 11 Jul 2014 12:47PM UTC
Files 335
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

LAST BUILD ON BRANCH fix_dup_media
branch: fix_dup_media
CHANGE BRANCH
x
Reset
  • fix_dup_media
  • ColCarroll-add_similarweb
  • RELEASE_20140325
  • add_flake8
  • alter_raw_downloads_storage_external
  • ansible-provisioning
  • ansible_localhost
  • ansible_provisioning
  • ap_api
  • ap_feed
  • api_cast_ids_to_int
  • api_coverage_test
  • api_limits_decrease
  • api_params_cast_to_int
  • api_topics_media_links
  • api_user_request_counts
  • archive_site_stories
  • archive_tv_sample_import
  • archive_twitter
  • auth_api
  • auth_backend_python
  • auth_case_insensitive_email
  • auth_reset_token_hash_if_not_null
  • auth_role_constants
  • auth_users_stats_index
  • autopod
  • backup_crawler_improvements
  • bigrams
  • bitly_100m_semiauto_partitions
  • bitly_all_stories
  • bitly_api
  • bitly_api_direct_call
  • bitly_click_stats
  • bitly_controversy_high_priority
  • bitly_created_by
  • bitly_disable_referrers
  • bitly_export_api_solr
  • bitly_ignore_homepages
  • bitly_json_webui
  • bitly_python
  • bitly_recreate_partition_trigger
  • bitly_stats_on_separate_page
  • bitly_store_results_in_s3
  • bitly_time_slices
  • bitly_web_ui
  • canonical_url_exception
  • cast_get_temporary_ids_table_param_to_int
  • celery_disable_heartbeat
  • celery_no_concurrency
  • celery_no_solo
  • ch_twitter_api
  • changes_from_production
  • chromeos_debian_stretch
  • chunk_topic_seed_query
  • circle_ci
  • clean_up_old_scripts
  • cliff_nytlabels_python
  • cliff_nytlabels_stories_id_int
  • cliff_nytlabels_tagging
  • cliff_tagging
  • cm_meta_20150319
  • cm_network_map
  • cm_performance
  • cm_slice
  • cm_status
  • colors_python
  • contibutors-info
  • corenlp_all_new_stories
  • corenlp_annotate_whole_story
  • corenlp_english_only
  • corenlp_enqueue_emm
  • corenlp_extra_stories
  • corenlp_give_up_after_500_read_timeout
  • corenlp_limit_request_size
  • corenlp_retry_requests
  • corenlp_sentences_id_mismatch
  • corenlp_test_request_length
  • cors_header
  • crashing_gearman_worker
  • crawler_auth
  • crawler_celery_fetchers
  • crawler_expected_results_json
  • crawler_rename_test_data
  • crawler_web_page_download_interval
  • create_missing_partitions_python
  • created_but_not_queued
  • date_guess_no_threshold
  • date_guessing_perl_integration
  • db_tuple_parameters
  • dbi_rewrite_to_python
  • default_config_mediawords_yml_dist
  • delete_story_sentences
  • devel_cover
  • devel_cover_db_fix
  • disable_superglue_live_feed
  • distributed_topic_spider
  • distributed_topic_spider_extract_links
  • distributed_topic_spider_fetch_link
  • doc_markdown
  • domaim_media
  • download_fixme
  • downloads_allow_no_prefix
  • downloads_cache
  • downloads_read_from_s3
  • drop_obsolete_db_columns_relations
  • drop_story_sentences_nonpartitioned
  • drop_superglue_support
  • edit_topic_queries
  • email_templates
  • encapsulate_dbix_simple
  • encapsulate_lwp_useragent
  • encapsulate_python_logging
  • export_import_tools_python
  • export_timespans_to_solr
  • extract_pipeline
  • extractor_null_bytes
  • extractor_python
  • extractor_readability_tests
  • extractor_requeue_locked_media
  • extractor_robustness
  • extractor_strip_control_chars
  • extractor_training
  • extractor_verbose_logging
  • facebook_cm_stats
  • facebook_page_links
  • feedly_import_20160322
  • fetch_link_extractor
  • fetch_tweet
  • fix_archive_is
  • fix_atom_feeds
  • fix_colors_perl_test
  • fix_crawler_tests_dates
  • fix_datetime_test
  • fix_db_quote
  • fix_facebook_test
  • fix_get_url_host_invalid_url
  • fix_guess_date
  • fix_is_http_url_normalize
  • fix_raw_1st_download
  • fix_readability_unit_tests
  • fix_tag_paging
  • fix_topic_ua_failures
  • fix_topic_word_count
  • fix_travis
  • fix_travis_ansible
  • fix_travis_carton_perl_516
  • fix_travis_cld
  • fix_travis_lxd
  • fix_travis_pip_cryptography
  • fix_travis_python
  • fix_travis_rabbitmq
  • fix_travis_random_failures
  • fix_travis_timezone
  • fix_travis_unit_tests
  • fix_twitter_test
  • fix_ua_attribute_error
  • fix_ua_error_http_body
  • fix_ua_unicode_decode_error
  • fix_vagrant_perlbrew
  • fix_vagrant_s3_tests
  • fix_vagrant_test
  • gearman_multithreading
  • generate_communities
  • graph_pm
  • hausa
  • hausa_stemmer_try
  • hindi_language
  • http_request_log_utf8_url
  • identify_language_test
  • ignore_topic_log_dead_link_unicodeencodeerror
  • increase_default_api_request_quota
  • index_tags
  • inline_python_segfault
  • inline_python_test
  • is_http_url_at_sign
  • japanese_support
  • job_broker_agnostic
  • json_annotations_encode_utf8
  • key_value_store_python
  • langid_python
  • language_modules_python
  • limit_solr_imports
  • limit_story_self_links
  • linkis
  • lock-mongodb-version
  • log4perl_everywhere
  • log_running_jobs
  • logogram_topics
  • master
  • media_coverage
  • media_geotag
  • media_health
  • media_import_export_backup_crawler
  • media_inlinks
  • media_list_q
  • media_normalized_url_trigger
  • media_primary_language
  • media_tag_dates
  • merge_api_client_from_upstream
  • milestone_5
  • ml_ap_detection
  • move_corenlp_bitly_to_postgresql
  • move_object_cache_to_postgresql
  • move_schema_migrations
  • move_solr_tools_to_mediacloud
  • multi_platform_topics
  • native_upserts
  • network_map_giant_component
  • null_inline_bug
  • object_cache_postgresql
  • only_fix_successful_downloads
  • parse_solr_query
  • partition_downloads
  • partition_downloads_2
  • partition_stories_tags_map_table
  • partition_story_sentences
  • partitioned_table_subquery
  • periodic_feed_rescraping
  • perl_5_22
  • perl_5_22_warnings
  • perl_api_client
  • pgcrypto_public
  • pin_rabbitmq_erlang_version
  • postgresql_10
  • postgresql_11
  • postgresql_datetime_string
  • prevent_spam_users
  • primary_key_column_view_support
  • psycopg2_escape_percentage_sign
  • py_date_guesser
  • python
  • python-requirements
  • python_3_7
  • python_forceatlas
  • python_handler_deproxy
  • python_provider
  • python_requests_ssl
  • python_rewrite_rules
  • python_rewrite_test
  • python_test_coverage_coveralls
  • python_topic_merge
  • python_topic_tweets
  • quick_snapshot
  • rabbitmq_3_7_3
  • rabbitmq_celery
  • rabbitmq_packagecloud
  • raw_downloads_separate_db
  • read_gridfs_downloads_from_s3
  • readability_cleanup
  • recover_failed_bitly_rabbitmq_queue
  • reenable_superglue_feed
  • reextract_stories_with_readability
  • reextraction_daemon
  • refactor_constant_into_readonly
  • reformat_solr_query
  • release
  • remove_auth_single
  • remove_auth_user_requests
  • remove_backwards_compatible_feed_fields
  • remove_bitly
  • remove_bots
  • remove_db_row_last_updated
  • remove_downloads_cascade_trigger
  • remove_fetch_twitter_scripts
  • remove_gearman_remains
  • remove_gridfs_in_corenlp_bitly
  • remove_heurex_crf_remains
  • remove_inline_downloads
  • remove_legacy_cruft
  • remove_mongodb
  • remove_old_web_ui
  • remove_reextract_script
  • remove_remains_of_controversies
  • remove_remains_of_db_row_last_updated
  • remove_sentences_from_solr
  • remove_story_sentences_tags_map
  • remove_thrift
  • remove_topics_mediasets_dashboards_etc
  • remove_translate_pattern_to_perl
  • remove_twitter_share_count
  • remove_util_dateparse
  • rename_cd_to_snap
  • rename_logger_object
  • request_limit_http_status
  • request_undefined
  • rescrape_us_msm
  • rescraping_daily_report
  • reset_us_election_stopword_list
  • restore_missing_downloads_helper
  • resurrect_twitter_stats
  • retweeter_regex
  • reuse_is_shortened_url
  • revert_inline_python
  • revert_python_dbix_simple
  • revert_raw_downloads_separate_db
  • rewrite_db_handler_to_python
  • ro_primary_key
  • rotate_http_request_log
  • rss_guesser
  • run_alone_var_run
  • s3_tests_unique_buckets
  • safe_exit
  • secure_travis_test_keys
  • separate_api_tests
  • serial_tests
  • sim_study
  • similar_stories
  • similarweb_fixes
  • similarweb_goodbye_dump
  • sitemap_feeds
  • sitemap_plain_text
  • skip_kernel_parameters
  • snapshot_bomb
  • solr_4.10.4
  • solr_4.6.0
  • solr_5.0.0
  • solr_5.5.2
  • solr_6.0.0
  • solr_6.1.0
  • solr_6_3_0
  • solr_6_5_0
  • solr_client_side_media_collections
  • solr_cve_2017_12629
  • solr_gradle
  • solr_import_log
  • solr_import_prioritize_topics
  • solr_import_supervisor
  • solr_install_externally
  • solr_oom
  • solr_python
  • solr_query_word_boundaries
  • solr_reduce_filter_cache
  • solr_remove_collection2
  • solr_remove_fq
  • solr_utf8
  • sources_api
  • speed_up_corenlp_api_call
  • speed_up_decode_object_from_bytes_if_needed
  • speed_up_sentence_field_count
  • speed_up_tags_list_fts_search
  • speed_up_tests
  • spider_performance
  • static_rss_dumps
  • stories_edit_api
  • story_index
  • story_language_stopwords
  • story_sentences_dup
  • story_sentences_fix_deadlock
  • story_sentences_partitioned_db_row_last_updated
  • story_title_parts
  • superglue
  • superglue_backfill_feeds
  • superglue_hide_video_url
  • test_optimizations
  • tidy_defaults
  • tm_initial_snapshot_celery
  • topic_limit_timespan_size
  • topic_media_tags
  • topic_modelling
  • topic_modelling_local
  • topic_retweeter_poles
  • topic_snapshot_include_media
  • topic_snapshot_tags
  • topic_spider_queue_snapshot
  • topic_versions
  • topics_api
  • travis_carton_cache_shasum
  • travis_ci_api_test
  • travis_container
  • travis_lxd
  • travis_lxd_word2vec
  • travis_mc_python_client
  • travis_trusty
  • travis_virtualenv
  • trigger_column
  • tweet_parse
  • twitter
  • ua_encoding_timeout
  • ua_response_utf8_surrogate_chars
  • ua_topic_fixes
  • ubuntu_16
  • ubuntu_16_no_perlbrew
  • uncompress_raw_downloads
  • univision_feeds
  • update_lxc_image
  • update_travis_docker_image
  • upgrade_ansible
  • upgrade_db_single_transaction
  • url_variants_python
  • use_mc_api_testing_client
  • use_pytest
  • use_text_worddiff_again
  • user_api
  • user_consent
  • useragent_memleak
  • vagrant_crawler_nowarnings
  • verify_schema_version_for_process
  • view_cached_page
  • web_useragent_python
  • word2vec_c_compat
  • word2vec_seq_scan
  • word2vec_topic_model
  • word_count_max_repeats
  • yaml_booleans
  • zh_dict
  • zh_support

pending completion
2611

push

travis-ci

hroberts
restore command line prints and use public set membership and number of stories as sorting metrics

prints in this script got turned into logging message in a log4perl update.  These should actually be
prints (or says) because this is an interactive command line script.

After updating the dup_media_id in the database so that the parent medium is the one we want to search,
we need to update this deduping script to encourage choosing the best searched medium.  We do that
by sorting possible medium alternatives by public set membership and number of stories in the last
year.

Also removed inlink querying because we don't need now that we are looking at the number of stories and
because the inlink query is very slow.  Also optimized some of the queries so that the script only takes
a minute or two to get started.

6915 of 14324 relevant lines covered (48.28%)

86.03 hits per line

Relevant lines Covered
Build:
Build:
14324 RELEVANT LINES 6915 COVERED LINES
86.03 HITS PER LINE
Source Files on fix_dup_media
  • List 0
  • Changed 0
  • Source Changed 0
  • Coverage Changed 0
Coverage ∆ File Lines Relevant Covered Missed Hits/Line

Recent builds

Builds Branch Commit Type Ran Committer Via Coverage
2611 fix_dup_media restore command line prints and use public set membership and number of stories as sorting metrics prints in this script got turned into logging message in a log4perl update. These should actually be prints (or says) because this is an interacti... push 14 Sep 2016 06:15PM UTC hroberts travis-ci pending completion  
See All Builds (4335)
  • Repo on GitHub
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc