What I have is a folder full of Protein Database Files (.pdb) that are code in the following pattern:
HEADER OXIDOREDUCTASE 27-FEB-12 4DXH
TITLE HORSE LIVER ALCOHOL DEHYDROGENASE COMPLEXED WITH NAD+ AND 2,2,2-
TITLE 2 TRIFLUOROETHANOL
COMPND MOL_ID: 1;
COMPND 2 MOLECULE: ALCOHOL DEHYDROGENASE E CHAIN;
COMPND 3 CHAIN: A, B;
COMPND 4 EC: 1.1.1.1
SOURCE MOL_ID: 1;
SOURCE 2 ORGANISM_SCIENTIFIC: EQUUS CABALLUS;
SOURCE 3 ORGANISM_COMMON: DOMESTIC HORSE,EQUINE;
SOURCE 4 ORGANISM_TAXID: 9796;
SOURCE 5 STRAIN: DOMESTIC HORSE;
SOURCE 6 ORGAN: LIVER;
SOURCE 7 OTHER_DETAILS: LIVER
KEYWDS ALCOHOL DEHYDROGENASE, NAD+,TRIFLUOROETHANOL, MICHAELIS COMPLEX
KEYWDS 2 ROSSMANN FOLD, OXIDOREDUCTASE
EXPDTA X-RAY DIFFRACTION
AUTHOR B.V.PLAPP,S.RAMASWAMY
REVDAT 4 27-JUN-12 4DXH 1 JRNL
REVDAT 3 16-MAY-12 4DXH 1 JRNL
REVDAT 2 02-MAY-12 4DXH 1 TITLE
REVDAT 1 11-APR-12 4DXH 0
What I want to do is GREP out the title, author, compound, realease date (HEADER), and the source. As you can see the SOURCE for example is on multiple lines, what I want to do is create a table listing the categories mentioned above to generate a table with columns telling me the information. I cannot figure out how to group all the SOURCE (and any other category with multiple lines) into one line....