|
What's with the version numbers?The sources for the charts are of such varied levels of quality that there are bound to be mistakes in the way the data is assembled. There is a continual effort to fix the most obvious problems in the data, and to add in additional charts as they become available. The version number tracks these changes, so if an issue is identified it can be tied to the version of the data that it was fixed in. The version number of this particular data set is 2.8.0050. There is a CSV File that contains all the song and album entries listed on the site. The file csv/top5000songs-2-8-0050.csv contains a listing of the top 5000 songs (in case 1000 is not enough for you) and the file csv/top3000albums-2-8-0050.csv contains a listing of the top 3000 albums. CSV File: impact-chart-2-8-0050.csvThe file impact-chart-2-8-0050.csv contains a complete set of the data published on this site in a form that is both convenient (since it is in a single place) and easy to manipulate. Each entry in the file has the following columns:
Technical Details: The format used is a conventional "Comma Seperated Values" file that uses strictly ASCII encoding (ie bottom 7 bits only). The Windows <CR><LF> line end sequence is used (0x0D 0x0A). The first row defines the names of the columns using lower case alpha characters plus underscore (code 0x5F). Within the actual data every item is enclosed within double quotes (code 0x22). There are no escape sequences in the data since the double quote charater does not appear in any data value (at least it shouldn't). However, single quotes (code 0x27) and commas (code 0x2C) do occur with the data. Note: If you load this data into Excel you will notice that some of the entries get screwed up, for example the song "1-2-3" gets converted to "01/Feb/2003" (and hence to the number 37653. This is because of the stupid way Excel treats things it thinks could possibly be dates. There are ways to stop it doing that but you'll have to ask Google for details.CSV Files: csv/top5000songs-2-8-0050.csv and csv/top3000albums-2-8-0050.csvThese two files list the top 5000 songs and 3000 albums (including some that are not listed anywhere on the site). The listing showing which charts each entry was in has been removed, only the summary scores are provided. Each entry in these files has the following columns:
The four regions are given weights that match the total value of music industry sales, so usa 35%, eng 20%, eur 25%, row 20%. In practice we don't have enough entries for each region in every year, the way we fix that is described elsewhere. If you want your spreadsheet to get the same final score as we do in then email us. Year Factors File: csv/yearfactor-2-8-0050.csvThe current set of year factors are provided in this CSV file. For each year this has the factors that are applied for songs and albums across each of the four regions. Old CSV filesThese contain the results of calculations of the top artists and songs from older versions of the data. They are here only for historical reasons, doing the same calculation with the most recent data would usually deliver better results.
Old Factor ValuesAs the volume of data from different eras and locations changes we adjust the year factors to balance out the scores. This is done by investigating the songs and albums in positions 1000-2000 and ensuring that after processing they have a reasonable distribution. This technique ensures that the top 1000 items don't influence the factors, but forces us to keep explicit values (rather than calculating the factors on the fly). Here are the values of factor tables for various versions:
Version HistoryHere are some of the highlights of the released versions
The comments here are from the the MusicID impact site site. This version is not able to accept comments yet Previous Comments (newest first) 12 Jun 2018 You forgot to write some highlights for versions after 2.8.001, I found the chart now is different from 2.8.001,even 2.8.007 The notes only highlight major changes of algorithm. The whole point is that EVERY version is different, that's why there are version numbers 16 May 2018 albums below 3000 do you have a list of top 1000 albums, that includes all albums from 1000-the most?+ Yes, the "Albums" page lists the top 1,000 albums. And the CSV file has a specific column for it ('albumentry_pos') 17 Aug 2016 Full Spreadsheet Can you bring back the full spreadsheet that lists all the songs and albums on this site? The full spreadsheet never went away. It can be accessed through the versions page, or various other links. We suspect that you have a stale link to a version that is no longer available, remember that the CSV file name includes the data version number, so at the time of writing this the file is called "impact-chart-2-6-0001.csv" but shortly the most recent version will be "impact-chart-2-6-0002.csv" (and so on). The version number is crucial to understanding what the data is telling you, so we won't ever create a file called (for example) "the MusicID impact site-chart.csv". The text at the foot of each page tells you the version number (and links to the version page) 15 Mar 2016 Regional rankings for the long list csv file First, I want to thank you guys on this site for the hard work that has been put into making theses lists and making them available to us. Second, I have a request about the regional scores in the ultra-long list csvfile (impact-chart-2-5-0022.csv). Currently there are rankings per region columns which only show the top 20 rankings, leaving many blank spaces in the columns for songs not in the top 20. Surely if you used the raw score for usa,eur,eng as in the top5000songs-2-5-0022.csv file, it would be more useful as it could be sorted and ranked by these scores for every song on the list and then filtered by year, decade etc? The regions in the the MusicID impact site-chart files are different from those in the raw files. In the MusicID impact site-chart there is "North America" (i.e. the US and Canada) and "Europe" (i.e. UK, Germany, France, Ireland etc). In the raw scores the regions are "USA", "Other English Speaking" (UK, Australia, Canada, Ireland etc), "Rest of Europe" (Germany, France, Italy etc) and "Rest of World". Notice that Canada, Australia and the UK, for example, are grouped differently. The listings in the MusicID impact site-chart are meant to be only reliable ones, by restricting ourselves to just the top 20 (and only in certain years) we can achieve that. Of course that does mean that most of the songs are blank (i.e. we don't have enough data to deliver a reliable answer). The listings in the top5000songs and top3000albums files are far closer to the raw values. Our expectation is that people who want that level of detail won't mind a bit of calculation and will be aware that a song scoring 5.001 is "really" the same as one scoring 5.000 (they are both in position 4000 or so). The full calculation we use is described in the FAQs (and requires the yearfactor csv file as well). If you want to know what was the 927th biggest hit in Europe in the 1970s you can use the top5000songs (and the yearfactor file) to calculate exactly that. We would suggest that the results produced would be, lets say unreliable, but we provide the data for anyone to do that. If, while you are trying things out, you uncover anything interesting we'd like to hear from you. 18 Dec 2013 2012 hello, hope to see the top songs for the year 2012 soon The first version should be available in the next few weeks... they should be trustable in about 2017 17 Jul 2013 hello everybody, I really like this site. Hope that it is still being updated since lately there were no updated, especially to the recent years top 100. It is being updated, but the recent charts have not been touched for some time 1 Sep 2012 2.1.0026 this version doesn't contain any evaluation in song_pos+year_pos+artist_pos!? The refactored code which shouldn't have changed anything had a bug in it, as a result the fields you mention were blanked out in the CSV file. We've fixed the code (and added in entries for the decade positions, North American positions and Eurpoean positions). Thanks for point out our mistake 12 Nov 2011 reshuffle i really like this site. may i ask how come recently a reshuffle in the top lists occured The way we calculate the scores was changed quite radically last month. That is why the data version number went from 1.10 to 2.0 The reason for the change was that some users pointed out some anomolies in the way songs from the 1990s were ordered. Particularly music that that had success in the USA but didn't do well in Europe. We modified the scoring system to overcome some issues with having too many charts from smaller countries. The first attempt introduced some other unwanted features so the algorithm has been further tuned. We hope that the overall result is a better. If you see any results that look "odd" tell us about them. 10 Mar 2011 looking for all the number 9 chart positiions can you help please Looking for all the number 9 chart positiions can you help please The first thing to say is that we list positions from a large number of charts, if you want data from a particular chart you'd find it easier to get that data direct. For example, you just wanted the songs that reached number 9 in the billboard charts then you'd find it easier to use the "Bullfrog" listing (see the "Source Charts" page to see how where to get it). Then you can use a spreadsheet program to identify entries that reached the number 9 slot. The easiest way to do any calculation is to download the CSV file (from the page that describes the version numbers). Its not clear if you are looking for the number 9 positions in our annual charts, or for songs that reached number 9 in one of the source charts. If you load the CSV file into a spreadsheet you should be able to filter on the "year_pos" attribute and select just the songs with 9 in that column. Alternately if you're looking for songs that reached number 9 in the source charts we don't have the data to identify all songs that were at the number 9 position at some time in their run, but we can tell you which songs peaked at number 9. The easiest way to do this calculation is to search the CSV file for the a string " 9 ". For example on Linux (or CygWin) the command: grep ' 9 ' impact-chart-1-10-0003.csv will list all the songs that peaked at number 9 anywhere. Alternately the command: grep 'Holland 9 -' impact-chart-1-10-0003.csv | grep " 197[3-7]" will identify all the songs that peaked at number 9 in Holland and then select only those which have years in the range 1973-1977. Combining simple searches in this way can quickly identify songs, for example the command: grep 'Holland 9 -' impact-chart-1-10-0003.csv | grep -i london shows that the only song which peaked at number 9 in Holland and mentioned London in the title was Ralph McTell's "Streets of London" from 1974. Of course we don't know why anyone would want to do that particular search. 20 Feb 2011 CSV file II Yes i'm novice. I really appreciate the "lesson" above wich shows the great character you have. A big big THANK YOU. 27 Jan 2011 CSV file I've got ideas about what I'd like to do but have no skills on working with your excel file. One of them would be making a list with 2011 songs (1011 more than you show) representing the actual year. Would you please teach me how to do it? Another thought: Would it be possible for you to do a month song table? For example: a list of 30 songs that did good in january in all years. 30 songs = 30 days :) Every now and then I visit you to appreciate your great work. Thank you guys! The CSV file allows you to try all kinds of different orders. Let us do some examples using Excel 2007 (you'll be able to do similar things with any good spreadsheet program). The explination below assumes that you really are a complete novice in using Excel, if you find them a bit too simplistic we apologise. First to list the 2011 top songs just select the "Sort" function under the "Data" tab. This will give you a "Sort" dialog. Make sure that the "My data has headers" toggle is ticked. Sort by "type" ("Z to A"), then by "score" ("Largest to Smallest"), then by "artist" ("A to Z"). The "Add Levels" button lets you insert the additional criteria. That should give you all the songs in order, the first 1000 will already be numbered for you in the song position column (song_pos). Suppose that instead of sorting all the songs we want the 1000 highest scoring songs for the period from 1975 to 1995. We can do this by inserting a new column. Click with the right mouse button on the "F" just above the column heading that says "song_pos", select "Insert" from the menu that comes up. That gives us a new blank column. In the top cell put the string "in_period". Click on the second cell then in the text entry area at the top (next to where it says "fx") enter the text '=IF(D2<1975,"",IF(D2>1995,"","yes"))'. Now click on that cell and press <Ctrl>-C. Click on the F3 cell scroll to the bottom of the sheet and hold the <Shift> key down while clicking on the F64355 cell, that should highlight a column of cells, finally paste with <Ctrl>-V. Now if we go to "Data"->"Sort" again, add "in_period" as an extra sort criteria (use the blue arrow to make it the top one). We have an ordered list of the songs from 1975-1995. How about if we want to adjust the scores? For example lets find the most outstanding albums of each year. First we have to decide what the term "outstanding" means, we have a varying number of charts for each year, so the scores are biased towards years with lots of charts. So lets adjust the scores by dividing them by the average score of the 10th to 20th positions in each year. So if we sort by "type" (A to Z), then "year", then "ayear_pos" (Smallest to Largest). Now insert two columns to the left of "song_pos". One we'll call "sum10to20", the other "factor". In the first column (cell G2) insert the formula "=IF(OR(D2<>D3,N2>20),0,IF(N2<10,G3,G3+E2))" (where D is the "year", and N is the "ayear_pos"). Copy that to the rest of the cells in that column. In the H2 cell insert the formula "=IF(D2=D1,H1,G2/10)". Now all the cells with the same album year should have the same factor in them. Now select column H by clicking on the button, all the cells should be highlit with a solid border. Copy these cells with <Ctrl>-C and on the "Home" tab select the pulldown menu under "Paste" in that menu pick "Paste Values". Now we can shuffle the cells and the values in column H won't change. Change column G's name to "adj_score" and in cell G2 insert the formula "=IF(H2=0,0,E2/H2)" and copy it to the rest of the column. Finally sort by "adj_score" (Largest to Smallest). This gives us a list of the albums that are furthest ahead of their contemporaries, like "Genius of Modern Music, Vol 2". You might decide that the factor is a bit too agressive, changing the formula in column G to "=IF(H2=0,0,E2/SQRT(H2))" makes the order more reasonable (after sorting again of course). Many of the charts we use don't tell us which month an entry belongs to. And, of course, if a song enters a chart in May and is in the charts for 12 weeks it probably spent more of June in the charts than May. So its hard to see how we could claculate a reasonable "Monthly" chart. We appreciate your support and suggestions. 14 Oct 2010 version control I have been following your ranking system for a long time. I like it a lot. The only complaint I have is that you change the ranking too frequently. I would recommend that you make the ranking change say every half year. Each half year ranking you give a version say Ver. Jan 2010 etc. It is the same thing like your software have different versions. By doing this it will make my life easier cause I am using your ranking to collect the music. Every time you make the change I have to adjust too. It is almost an impossible job to follow your floating ranking. Thank you for your consideration. Or you can find some better solution. The obvious solution would be for you to download the CSV file (linked on the "Version" page) and use that for a period of time. We estimate that the data is better than 99.4% correct, this means that we "only" have 1600 or so errors! We have a continual effort to identify and correct discrepancies between the various source charts, supported and encouraged by users that tell us about corrections that are needed. It is true that these corrections end up shuffling some of the songs, especially those from before 1940, after 2007 and low down in each artist's song list. That is eaxctly those we have least evidence for. In addition these "less measured" parts of the data get changed when we identify and integrate new sources. We suspect that has a bigger impact on the items you are collecting. However we feel that publishing the most recent version of the data, with the most up to date corrections, is more important than having a list that stays static at the places where the rankings are less certain. Our version numbers are, of course, exactly like software release numbers (of course they ARE software release numbers). But our approach is more "open source" than "big software developer", so, like open source software we release often. As we said at the beginning we would suggest you download the CSV file and use it as you master list for a while, switching to a new version when YOU feel the time is right to do so (rather than when we decide to release a new set). Oh, and thanks for the support. 13 Sep 2007 vince spain When do you change the charts? can a song moves in the chart? what charts are closed? As song or artist names are corrected they may affect the order of the charts. For example the Vaughn Monroe song "Riders in the Sky" was listed as "Ghost Riders in the Sky" in the European charts. So that chart had "Riders in the Sky" as number 7 of 1949 and "Ghost Riders in the Sky" as the number 33. When the European entries were changed to be the same as the rest of the charts the new chart placed the song at number 1 for the year. The quality of the input data varies from one source to the next. As continual effort is expended to refine the data it is inevitable that the positions of some resulting entries will change (reflecting a higher quality set of results). |