Explore TCGA RNA Expression Data

Create the graph

grip create tcga-rna

Get the data

curl -O http://download.cbioportal.org/gbm_tcga_pub2013.tar.gz
tar xvzf gbm_tcga_pub2013.tar.gz

Load clinical data

./example/load_matrix.py tcga-rna gbm_tcga_pub2013/data_clinical.txt --row-label 'Donor'

Load RNASeq data

./example/load_matrix.py tcga-rna gbm_tcga_pub2013/data_RNA_Seq_v2_expression_median.txt -t  --index-col 1 --row-label RNASeq --row-prefix "RNA:" --exclude RNA:Hugo_Symbol

Connect RNASeq data to Clinical data

./example/load_matrix.py tcga-rna gbm_tcga_pub2013/data_RNA_Seq_v2_expression_median.txt -t  --index-col 1 --no-vertex --edge 'RNA:{_gid}' rna

Connect Clinical data to subtypes

./example/load_matrix.py tcga-rna gbm_tcga_pub2013/data_clinical.txt --no-vertex -e "{EXPRESSION_SUBTYPE}" subtype --dst-vertex "{EXPRESSION_SUBTYPE}" Subtype

Load Hugo Symbol to EntrezID translation table from RNA matrix annotations

./example/load_matrix.py tcga-rna gbm_tcga_pub2013/data_RNA_Seq_v2_expression_median.txt --column-include Entrez_Gene_Id --row-label Gene

Load Mutation Information

./example/load_matrix.py tcga-rna gbm_tcga_pub2013/data_mutations_extended.txt --skiprows 1 --index-col -1  --regex Matched_Norm_Sample_Barcode '\-\d\d$' '' --edge '{Matched_Norm_Sample_Barcode}' variantIn --edge '{Hugo_Symbol}' effectsGene --column-exclude ma_func.impact ma_fi.score MA_FI.score MA_Func.Impact MA:link.MSA MA:FImpact MA:protein.change MA:link.var MA:FIS MA:link.PDB --row-label Variant

Load Proneural samples into a matrix

import pandas
import gripql

conn = gripql.Connection("http://localhost:8201")
g = conn.graph("tcga-rna")
genes = {}
for k, v in g.query().V().hasLabel("Gene").render(["_gid", "Hugo_Symbol"]):
    genes[k] = v
data = {}
for row in g.query().V("Proneural").in_().out("rna").render(["_gid", "_data"]):
    data[row[0]] = row[1]
samples = pandas.DataFrame(data).rename(genes).transpose().fillna(0.0)

Matrix Load project

usage: load_matrix.py [-h] [--sep SEP] [--server SERVER]
                      [--row-label ROW_LABEL] [--row-prefix ROW_PREFIX] [-t]
                      [--index-col INDEX_COL] [--connect]
                      [--col-label COL_LABEL] [--col-prefix COL_PREFIX]
                      [--edge-label EDGE_LABEL] [--edge-prop EDGE_PROP]
                      [--columns [COLUMNS [COLUMNS ...]]]
                      [--column-include COLUMN_INCLUDE] [--no-vertex]
                      [-e EDGE EDGE] [--dst-vertex DST_VERTEX DST_VERTEX]
                      [-x EXCLUDE] [-d]
                      db input

positional arguments:
  db                    Destination Graph
  input                 Input File

optional arguments:
  -h, --help            show this help message and exit
  --sep SEP             TSV delimiter
  --server SERVER       Server Address
  --row-label ROW_LABEL
                        Vertex Label used when loading rows
  --row-prefix ROW_PREFIX
                        Prefix added to row vertex gid
  -t, --transpose       Transpose matrix
  --index-col INDEX_COL
                        Column number to use as index (and gid for vertex
  --connect             Switch to 'fully connected mode' and load matrix cell
                        values on edges between row and column names
  --col-label COL_LABEL
                        Column vertex label in 'connect' mode
  --col-prefix COL_PREFIX
                        Prefix added to col vertex gid in 'connect' mode
  --edge-label EDGE_LABEL
                        Edge label for edges in 'connect' mode
  --edge-prop EDGE_PROP
                        Property name for storing value when in 'connect' mode
  --columns [COLUMNS [COLUMNS ...]]
                        Rename columns in TSV
  --column-include COLUMN_INCLUDE
                        List subset of columns to use from TSV
  --no-vertex           Do not load row as vertex
  -e EDGE EDGE, --edge EDGE EDGE
                        Create an edge the connected the current row vertex
                        args: <dst> <edgeType>
  --dst-vertex DST_VERTEX DST_VERTEX
                        Create a destination vertex, args: <dstVertex>
  -x EXCLUDE, --exclude EXCLUDE
                        Exclude row id
  -d                    Run in debug mode. Print actions and make no changes