Hi guys!
I was reading these days about the dangers of having all the material of a particular web, or portal, in the hands of a single person because if something happens to the page the community would lose the stuff forever. However, I also understand that not everybody wants a mirror of his or her stuff publicly available everywhere. Thinking about this, I came across the idea of creating a repository that has all the Amstrad stuff inside but is not available to the general public. Something like a huge backup that would allow to bring back files if something catastrophic happens with a web or portal. Do you think that this would be a good idea? I actually could have the structure to do something like this if the community is interested and we are not talking about a huge number of Teras of data :) . I do not know if it would be possible to store the huge amount of videos that populate Youtube, but definitely scans, games, programs, and everything that has historic meaning for us. Do you think that everything would fit in, let´s say, 6TB?
A big thing to consider is the wiki itself - such a vast amount of knowledge on here. I know Gryzor has backups. I think it should also be starting to get mirrored on archive.org. But another mirror is always good. The haiku OS community lost a lot of the old BeOS software quite recently because the guy who ran bebits and haikuware got in a huff about some decisions of the haiku project and took it down (actually he more-or-less held the project to ransom for it if they didn't remove the new package management from haiku). So much software, lost forever, very sad.
(Not that I'm suggesting Gryzor would do anything like that! But backups don't hurt)
Very good point.
I thought about this several times, specially because of my experience with other sites like WOS changing hands; and even I thought about starting a project myself. But to be honest, why would I do things better? Private initiatives are likely to face the same risks and challenges.
That's why I'm getting more and more convinced that putting resources into archive.org, or a similar non profit like them, would be the best solution.
They already have archives for magazines (eg, Micromania Segunda Epoca (Spanish) : Free Texts : Download & Streaming : (https://archive.org/details/micromania-segunda-epoca)) and other old systems (eg, The Software Library: Atari Computer : Free Software : Download & Streaming : (https://archive.org/details/softwarelibrary_atari)); perhaps we could contact Jason Scott or any other archivist and see how to start a chapter for the CPC.
No disrespect to anyone, but we need to make sure this shizzle is backed up. 6TB disk is probably more than enough, but you'd need to then have RAID 1, 5 or 10 set up in hardware, in case of a disk failure. Then you'd probably need to back it off to DLT or LTO tape to make sure data is constantly being de-duplicated and off site in case of a fire or burglary.
Quote from: ukmarkh on 15:33, 17 February 16
No disrespect to anyone, but we need to make sure this shizzle is backed up. 6TB disk is probably more than enough, but you'd need to then have RAID 1, 5 or 10 set up in hardware, in case of a disk failure. Then you'd probably need to back it off to DLT or LTO tape to make sure data is constantly being de-duplicated and off site in case of a fire or burglary.
It all depends on the size. Here, for example, I have an automatic mirroring system that copies all the data to the datacenter of this building and to another datacenter that is miles away every night. We also keep three more copies of everything around the lab. This is possible for some hundreds of GB, but not for 50 TB :) . A few TB is probably as much as I can handle and, of course, it would be really really great to have mirrors of this somewhere else.
Every action towards preservation has my vote
This thread reminds me of cpczone :(
And me of PnP...
QuoteEvery action towards preservation has my vote
(http://cdn.meme.am/instances/15409427.jpg)
jokes appart : yeay, it is very important to have some backups in different places.
what is the size of the CPCwiki and its forum ?
maybe a distributed private cloud as one backup, spread over all of our pc's ?
like "https://www.getsync.com/" (https://www.getsync.com/"); or so - I use big datacenters at work for backup so I have not much knowledge about distributed backup.
Quote from: Dr Tiger Ninestein on 19:09, 17 February 16
This thread reminds me of cpczone :(
World Of Spectrum still isn't working properly all this time on from the ownership change. A perfect example of a community relying on one resource and one person and when they quit everything going wrong.
It's crying shame what has happened to WOS. :'(
Quote from: ukmarkh on 22:50, 17 February 16
It's crying shame what has happened to WOS. :'(
Not entirely the same thing though, there was already a hand over in process long before the server crash. The issues with it have all mostly been a desire by the new team to do a complete rewrite behind the scenes. It's not the way I'd have gone about things, personally, but there you go. It's certainly a shadow of it's former self though.
I'd still take that over what happened with CPCZone, or even things like Push'N'Pop where some genuinely useful information was undoubtedly lost.
I don't want to talk about CPC Zone... Too Sad!
Quote from: ukmarkh on 22:50, 17 February 16
It's crying shame what has happened to WOS. :'(
It's down again and word on the street is that it is so broken and the workload so great it's doubtful it will ever be fully working again. Apparently a different team are working on a new Speccy resource using the same Infoseek database.
The current WoS site (well before today) is/was a temporary set-up using the data from the old server, but running on a new server. Because of the way that the original site grew by using custom code to glue various sub-systems together, and the lack of up-to-date versions of the software concerned, the team decided to start a new site from scratch.
As this is taking a while to sort out, if you use the current site, it does look like nothing much is happening.
But apparently all the data is backed up.
Yes, there is a big risk with WoS, as due to it's history (set-up and operated by one person) who did not allow mirrors, should the current team suffer a big real life problem. It's very possible that WoS will die.
And I agree with the comments made above about ensuring that there are multiple back-ups and copies of everything (well maybe not all the videos) relating to the CPCs.
It would also be a good idea if there are at least two caretakers just in case the owner/operator of the this site suffers from a big real life problem.
Mark
Quote from: ukmarkh on 23:11, 17 February 16
I don't want to talk about CPC Zone... Too Sad!
What happened that place in the end anyway? I was on hiatus for a couple of years, came back and the place vanished!
ftp://ftp.lip6.fr/pub/amstrad/ is as close as it gets to this. But I'm not sure if it's still being updated (maybe Paris 6 univeristy just forgot they have an FTP server running).
Anyway, a single place archive isn't very safe, so the only way that really works is having several copies in different places and owned by different people. And never trust one single website, be it something big like cpcwiki/pushnpop, wih your data.
It's times like this, we really have to appreciate this site and forum. A big shout out to Gryzor and all who make it a wonderful resource.
Quote from: PulkoMandy on 18:54, 19 February 16
ftp://ftp.lip6.fr/pub/amstrad/ (ftp://ftp.lip6.fr/pub/amstrad/) is as close as it gets to this. But I'm not sure if it's still being updated (maybe Paris 6 univeristy just forgot they have an FTP server running).
Anyway, a single place archive isn't very safe, so the only way that really works is having several copies in different places and owned by different people. And never trust one single website, be it something big like cpcwiki/pushnpop, wih your data.
What I am going to do, at least, is to download everything from there and make sure that it is safe :)
Quote from: 1024MAK on 18:48, 19 February 16
It would also be a good idea if there are at least two caretakers just in case the owner/operator of the this site suffers from a big real life problem.
Mark
WOS now has a mirror and this place should probably have one as well with backups that can be reverted back to.
Who runs NVG these days? Could or would NVG be able to run a mirror?
In the meantime, I made 3 copies and I put one in the network drive, the one that has multiple mirrors around London. It is, of course, not available to the general public but at least we know that it will not be lost :). To be honest, is less that 500 MB, each one of us could keep a copy at home. This would probably be the best possible backup.
Quote from: chinnyhill10 on 19:21, 19 February 16
WOS now has a mirror and this place should probably have one as well with backups that can be reverted back to.
I believe cpcwiki is backed up every day.
I have seen posts that indicate this.
Quote from: ||C|-|E|| on 19:37, 19 February 16
In the meantime, I made 3 copies and I put one in the network drive, the one that has multiple mirrors around London. It is, of course, not available to the general public but at least we know that it will not be lost :) . To be honest, is less that 500 MB, each one of us could keep a copy at home. This would probably be the best possible backup.
Good Idea. I have started downlaoding ftp://ftp.nvg.unit.no/pub/cpc/ (ftp://ftp.nvg.unit.no/pub/cpc/) (without NC100 and PCW) now.
-> 700MB downloaded.
I have downloaded it as well :D . At least, the contents of those two places should be safe now :D :D
Quote from: chinnyhill10 on 15:58, 19 February 16
It's down again and word on the street is that it is so broken and the workload so great it's doubtful it will ever be fully working again. Apparently a different team are working on a new Speccy resource using the same Infoseek database.
WoS is back online :D
Mark
Quote from: ||C|-|E|| on 00:25, 20 February 16
I have downloaded it as well :D . At least, the contents of those two places should be safe now :D :D
So did I with ftp://ftp.lip6.fr/pub/amstrad/ (ftp://ftp.lip6.fr/pub/amstrad/)
I'm listening...
The CPCWiki takes, IIRC, ca 100GBs.
But the major problem is, I think, deciding what "everything" is. Sure, a copy of all the disk images or the GamebaseCPC for instance is easy enough to define (though not to update, mind you - it would take a few people watching out for stuff, and keep in mind that even here on the Wiki we miss lots of stuff!), but with sites etc it becomes more difficult. Say you mirror the CPCWiki - what are you going to do without the database?
I have a big archive containing most of the CPCZone stuff (courtesy of an anonymous donor) but what can I do with it since I don't have the db?
Quote from: Gryzor on 10:46, 22 February 16
I have a big archive containing most of the CPCZone stuff (courtesy of an anonymous donor) but what can I do with it since I don't have the db?
What does the archive contain? Everything except the forum posts?
Quote from: Executioner on 10:57, 22 February 16
What does the archive contain? Everything except the forum posts?
I'll have to check, really, but I think so.
This thread is really interesting. Thinking on what you all have said, I imagine something like a distributed-mirrorred-hosting. The idea would be to have a central "entity" like WOS, CPCWiki or whatever, but replicated among many individual supporters. Similar to bittorrent, but for keeping data and access safe. With a solution like this, the number of complete copies would be a measure for how safely data is guarded.
I'm digressing. Maybe we should start with something simple as any kind of automatic mirroring software and some servers. I can try to start a software preservation project at my university to get space for mirroring. It will also be nice to contact other entities, like archive.org, as suggested.
Count on me for any help on preservating software :D
By the way, who does everybody the safekeeper will be? The new wiki server has tons of space left, so, I don't know, maybe we could set up something here, too...
@Gryzor (http://www.cpcwiki.eu/forum/index.php?action=profile;u=1) while we're on he topic, do you have arrangements or a second in command for the worst case scenario where something happens to you? Like if the C64 scene kidnap you and hold the site to ransom for example?!
TBH I think the easiest thing if the whole site needed to be backed up would be for Gryzor to keep an off site backup taken once every month or so (or less if the incremental differences are small enough) and arrange someone to be in place in case anything happened to him.
A bit off topic. At my last job our off site backups meant a tape drive backup started every night around the time work finished, that was physically removed from site at the end of the following day (we had to remember to remove the old tape and put the new one in every night). In this case we were unable to connect the relevant machine to any outside network for security reasons though. We reused a lot of tapes, but we would at any time have one tape for each of the last 7 days, one from each of the last 4 weeks, one from every single month going back a couple of years, and beyond that one for every 6 months.
That's a very valid question, @Munchausen (http://www.cpcwiki.eu/forum/index.php?action=profile;u=792) , and the answer is yes: there's a second admin who works mostly on the server side and has full access, so it is expected that in the event where I'm suddenly recalled to my home planet there will be someone who can take care of things.
Backups are taken every day and stored, indeed, off-site. If anyone can think of a better scheme, let me know!
So we do have a BOFH in the server room! :)
Well, he's a really nice guy actually, but I had forgotten all about BOFH... geez I got some catching up to do!
Oh man, BOFH. Thank you so much for reminding me of this. :D
You could just use two different storage locations, linked over IP and use 'RoboCopy' to sync changes in data. There's lots of other options, but this is free... As mentioned in a previous post of mine, offsite tape backups are s good option, but there's investment needed for such things.
Sent from my iPhone using Tapatalk
We do that by syncing the server to a remote one already. As for tapes... Yeah, well, if people can pool together some money to get me a tape backup solution, I can do that on a weekly basis :D
Quote from: Gryzor on 11:51, 25 February 16
We do that by syncing the server to a remote one already. As for tapes... Yeah, well, if people can pool together some money to get me a tape backup solution, I can do that on a weekly basis :D
If you have local and remote backups and a second person to take over in your place I think everything is covered. I don't feel any concern that the site will disappear.
Pity, I was really hoping people would buy me a tape backup system :)
Quote from: Gryzor on 12:28, 25 February 16
Pity, I was really hoping people would buy me a tape backup system :)
Well it really is a pain changing the disks all the time! But having more redundancy is never a bad thing when it comes to data preservation!
Quote from: Gryzor on 12:28, 25 February 16
Pity, I was really hoping people would buy me a tape backup system :)
I have a sealed C90 tape if that'll help : ;D
Quote from: seanb on 14:26, 25 February 16
I have a sealed C90 tape if that'll help : ;D
A good start! Like, what, six games worth of tape? Let's go!