Jump to content
Click here if you are having website access problems ×

The knackered formatting of the historic posts


anthonym

Recommended Posts

If you, the club, can give me some file examples, preferably of some of my own old stuff for which I can find you links, I will have a crack at looking for a hex editing solution.

Drupal has long had a line ending problem, which I suspect is what underlies the broken formatting problem. I have solved this very issue in other scenarios over the years.

I could do with knowing what sort of server we are on? I guess unix of some sort? But that’s just an unfounded guess. The difference betweén servers is the hex line feeds - mac / pc is the most common problem. It may be as simple as an AWK script.

Anthony

Link to comment
Share on other sites

  • Support Team

Hi Anthony - I don't have the original data to hand but I'm sure I could get hold of it. I'm not around for the next 2 weeks but will see what I can find when I'm back.

The original site import was fairly rudimentary and involved some pretty heavy spreadsheet manipulation in order to combine the content from the 3 different old sites and assign to user IDs - on old BC users had multiple aliases which caused the biggest headache. It was probably this process that 1) messed up the formatting and 2) lost the embedded URL links.

The migration was the single biggest expense in the move and the most fraught part of it. The sheer size of the database meant it took about 2 weeks to do the import and indexing. When the data issues were spotted we had been live for over 2 weeks and it just wasn't possible to go back and re-do the import. We couldn't do an update/fix because there was no unique key match between the old database and the new (Drupal has it's own unique key and this wasn't written back to the old DB). I'm sure with the right knowledge time and access to the system, someone could work out a way to do it but the cost to pay someone to do this would be huge.

I don't know what the underlying server architecture is exactly because we host with s specialist Drupal host provider (Pantheon) who provide a flexible, scalable, demand driven architecture. I think it is UNIX underneath and possibly a My SQL DB but I'd have to check with our providers.

Shaun

Link to comment
Share on other sites

Shaun,

One thing that really makes me wonder: You say that "The migration was the single biggest expense in the move"

With all the errors, why was it paid for?

I am also sure that someone in the know would have been able to extract data from "the last two weeks" (as in the re-indexing period) and then merge this with the old data. The longer we wait the worse it gets.

 

Link to comment
Share on other sites

Hi Shaun, shame I had no idea, I have decades of experience in that sort of thing. However, what is done is done. I will have a look at Pantheon just in case an idea manifests. The principal problem appears to be LF errors (just one repeated). The fundamental question for me is whether at the HEX level (just like indices) there is something unique, which given LFs are unique it may be possible to literally see the problem and fix it with hex S&R, trouble would be if that breaks something else.

Alternatively (or as well) access to the original data would help, plus which operating systems and programs were in use as these determine the hex source. I have seen Drupal has had issues in this area.

In other words as much system info as possible. For example where is the site data from the site that was for a while running on 234.com (I have forgotten what the correct number was). How much data are we talking about? 

Any or all of the above may offer insight to a solution. If we could find something that works in test or prototype we could then look at the full size issues.

The URLs I have less idea, but again having info would allow me some chance.

If any of the transition spreadsheets still exist they could be helpful. Use of spreadsheets also suggests the data volume may not be so huge.... but maybe they are subsets, these also can mess with LFs.

do you use Terminus and on mac or unix/Linux?

Everywhere I look it’s an issue: (which is not a surprise) google

Typical example.

Anthony

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...