With Python, can I keep a persistent dictionary and modify it?

Asked
Viewd8030

13

So, I want to store a dictionary in a persistent file. Is there a way to use regular dictionary methods to add, print, or delete entries from the dictionary in that file?

It seems that I would be able to use cPickle to store the dictionary and load it, but I'm not sure where to take it from there.

  • When you read the pickle documentation, what questions did you have? Can you post some code to show what you have working and what you need help with?

    S.Lott04 августа 2009, 18:13
  • Basically, I would want to use the dictionary as a database type thing. So I could write the dictionary to a file, and then load the file in my script when I wanted to add something to the dictionary, but using regular dictionary methods.

    Is there a way I can just load the file, and then modify the dictionary with the typical

    dict[“key”] = “items” or del dict[“key”]?

    I’ve tried to do this now, and python tells me that dict is undefined in this particular example.

    05 августа 2009, 12:53

7 ответов

18

If your keys (not necessarily the values) are strings, the shelve standard library module does what you want pretty seamlessly.

  • Il piacere é tutto mio, Стефано! -)

    Alex Martelli05 августа 2009, 01:30
1

Если в качестве ключей используются только строки (как разрешено модулем shelve ), недостаточно, FileDict может быть хорошим способом решить эту проблему.

6

Unpickle from file when program loads, modify as a normal dictionary in memory while program is running, pickle to file when program exits? Not sure exactly what more you're asking for here.

1

My favorite method (which does not use standard python dictionary functions): Read/write YAML files using PyYaml. See this answer for details, summarized here:

Create a YAML file, "employment.yml":

new jersey:
  mercer county:
    pumbers: 3
    programmers: 81
  middlesex county:
    salesmen: 62
    programmers: 81
new york:
  queens county:
    plumbers: 9
    salesmen: 36

Step 3: Read it in Python

import yaml
file_handle = open("employment.yml")
my__dictionary = yaml.safe_load(file_handle)
file_handle.close()

and now my__dictionary has all the values. If you needed to do this on the fly, create a string containing YAML and parse it wth yaml.safe_load.

0

pickling has one disadvantage. it can be expensive if your dictionary has to be read and written frequently from disk and it's large. pickle dumps the stuff down (whole). unpickle gets the stuff up (as a whole).

if you have to handle small dicts, pickle is ok. If you are going to work with something more complex, go for berkelydb. It is basically made to store key:value pairs.

1

Assuming the keys and values have working implementations of repr, one solution is that you save the string representation of the dictionary (repr(dict)) to file. YOu can load it using the eval function (eval(inputstring)). There are two main disadvantages of this technique:

1) Is will not work with types that have an unuseable implementation of repr (or may even seem to work, but fail). You'll need to pay at least some attention to what is going on.

2) Your file-load mechanism is basically straight-out executing Python code. Not great for security unless you fully control the input.

It has 1 advantage: Absurdly easy to do.

8

Использовать JSON

Подобно ответу Пита, мне нравится использовать JSON, потому что он очень хорошо отображается в структурах данных Python и очень удобен для чтения:

Сохранение данных тривиально:

 >>> import json
>>> db = {'hello': 123, 'foo': [1,2,3,4,5,6], 'bar': {'a': 0, 'b':9}}
>>> fh = open("db.json", 'w')
>>> json.dump(db, fh)
 

и загрузка примерно такая же:

 >>> import json
>>> fh = open("db.json", 'r')
>>> db = json.load(fh)
>>> db
{'hello': 123, 'bar': {'a': 0, 'b': 9}, 'foo': [1, 2, 3, 4, 5, 6]}
>>> del new_db['foo'][3]
>>> new_db['foo']
[1, 2, 3, 5, 6]
 

Кроме того, загрузка JSON не страдает теми же проблемами безопасности, что и shelve и pickle, хотя IIRC работает медленнее, чем pickle.

Если вы хотите писать при каждой операции:

Если вы хотите экономить на каждой операции, вы можете создать подкласс объекта Python dict:

 import os
import json

class DictPersistJSON(dict):
    def __init__(self, filename, *args, **kwargs):
        self.filename = filename
        self._load();
        self.update(*args, **kwargs)

    def _load(self):
        if os.path.isfile(self.filename) 
           and os.path.getsize(self.filename) > 0:
            with open(self.filename, 'r') as fh:
                self.update(json.load(fh))

    def _dump(self):
        with open(self.filename, 'w') as fh:
            json.dump(self, fh)

    def __getitem__(self, key):
        return dict.__getitem__(self, key)

    def __setitem__(self, key, val):
        dict.__setitem__(self, key, val)
        self._dump()

    def __repr__(self):
        dictrepr = dict.__repr__(self)
        return '%s(%s)' % (type(self).__name__, dictrepr)

    def update(self, *args, **kwargs):
        for k, v in dict(*args, **kwargs).items():
            self[k] = v
        self._dump()
 

Что можно использовать вот так:

 db = DictPersistJSON("db.json")
db["foo"] = "bar" # Will trigger a write
 

Что крайне неэффективно, но может быстро сдвинуть вас с мертвой точки.